Your fitness tracker is a brilliant pedometer wearing the costume of a lab. Some of the numbers on its screen are genuinely accurate; others are confident guesses dressed up as measurements. The single rule that predicts which is which: trackers are good at counting things that make a clean physical signal, and bad at modelling things they have to infer. A stride, a pulse beat and a satellite fix are countable. Your calorie burn, your sleep stage and how stressed you feel are not, so the device estimates them, and that is where the numbers wobble.
Here is the trust ranking, sorted by how hard the evidence backs each metric, plus what to actually do with it.
Tier 1: trust the number
These are the metrics the device measures more or less directly. Act on them.
Steps. A 2025 living meta-analysis of Apple Watch studies put step error at 8.17% on average, comfortably inside the 10% threshold researchers treat as valid. Steps are a clean mechanical signal, so the watch counts them well. The catch is real and worth knowing: wrist trackers undercount badly when your arm is not swinging. Pushing a stroller, a shopping trolley or a wheelchair can hide 35% to 95% of your steps, and very slow walking gets undercounted too, with error climbing sharply at a crawl. Your steps are not missing because you are unfit. They are missing because your wrist sat still.
Steady-state heart rate. This is the quiet success story of wearables. In the seminal 2017 Stanford evaluation of seven wrist devices, six measured heart rate to within 5% error. The 2025 Apple Watch meta-analysis landed at 4.43%. At a steady effort, optical heart rate is close enough to trust. The asterisk: it degrades during intervals and sprints, when arm motion and poor skin contact smear the signal, and the error climbs with intensity. So the number is trustworthy when you are jogging at a constant pace and shakier when you are doing hill repeats.
GPS pace and distance. A validation study of eight positioning-enabled sport watches found distance error of 3.2% to 6.1%, with only Polar's receivers landing under 5% overall. Distance tends to be underestimated, and accuracy drops in cities and forests, where buildings and tree cover scatter the satellite signal. Running produces more error than walking or cycling. For an easy park run on open ground, the distance is solid; for a route threading between HDB blocks in the CBD, knock off a little faith.
Tier 2: trust the trend, not the digit
Resting heart rate and sleep duration sit one rung down. The device measures them reliably enough to flag your own week-to-week changes, but not precisely enough to compare against a friend or a textbook figure.
For sleep, modern trackers nail the basic question. Across Oura, Apple Watch and Fitbit, sleep-versus-wake sensitivity runs at 95% or higher. The watch genuinely knows roughly how long you slept. Your resting heart rate creeping up over a stressful fortnight, or your sleep duration sliding after a month of late nights, is a signal worth reading. Just read the direction of travel, not the third decimal place.
Read the direction of travel, not the third decimal place.
Tier 3: treat with a fistful of salt
Now the guesses. These metrics are inferred from the raw signals, and the inference is where accuracy goes to die.
Calorie burn. This is the single least accurate number on the device, full stop. In the 2017 Stanford study, not one of the seven wrist wearables measured energy expenditure accurately; the best was off by 27% and the worst by 93%. Nearly a decade later, the 2025 meta-analysis found the Apple Watch's calorie error sitting at 27.96%, with cycling worse (around 52%) than walking or running (around 31%). The error has barely moved across hardware generations, which tells you this is a limit of guessing calories from a wrist signal, not a software bug a firmware update will fix.
The practical fallout: do not eat back the calories your watch says you burned. If the device tells you that spin class torched 600 calories and the truth is closer to 300, treating the screen as gospel is a reliable way to stall fat loss. The calorie figure is a vibe, not an accounting entry.
Sleep-stage breakdowns. Knowing how long you slept is easy. Sorting that sleep into light, deep and REM is hard. A 2024 study against polysomnography (the clinical gold standard) found deep-sleep sensitivity of 79.5% for Oura, 61.7% for Fitbit and 50.5% for Apple Watch. An 11-device multicentre study covering 349,114 epochs of sleep found the best device managed a staging score (macro F1) of just 0.69, and the worst 0.26. So when your app announces you got 47 minutes of deep sleep, treat that as a loose estimate of a quantity even sleep labs find fiddly. The hypnogram is decorative.
Stress scores and HRV "readiness". These are the boldest guesses of all, because they take an already-noisy input (heart-rate variability) and slap an emotional label on it. In a large real-world study of information workers, tracker HRV explained about 2.2% of the variance in how stressed people actually felt, which is roughly a correlation of 0.15. The researchers explicitly cautioned against calling HRV "stress" without proper validity data. A low score does not mean you are objectively stressed or that you must skip training. Treat it as a vague mood ring, not a verdict.
Readiness, recovery and "body battery" scores deserve the same scepticism, because they are proprietary models built largely on the same shaky HRV and sleep-stage inputs. They can be a rough nudge to take it easier. They are not a number to obey.
Why better hardware can't fully fix it
Two of the Tier 3 problems are baked into the physics. Wrist heart rate uses optical sensors (PPG) that shine green light into your skin and read the reflection. Motion blurs that signal, and so does anything that absorbs the light: a 2025 study found error grows with exercise intensity and is larger for darker skin tones, because melanin absorbs green light. Tattoos under the sensor and higher BMI degrade it too. Steady-state heart rate is the safe zone; intervals, sprints and inked wrists are not.
The deeper issue is the gap between measuring a signal and inferring a quantity from it. The watch can read your pulse. It cannot read your metabolism, your brainwaves or your mood, so it models them from proxies, and a model is only ever as good as its weakest assumption. No price tag closes that gap.
The cheat sheet
| Metric | Trust level | What to do with it |
|---|---|---|
| Steps | High | Act on it (mind the still-arm undercount) |
| Steady-state heart rate | High | Act on it (not for intervals) |
| GPS pace and distance | High | Act on it (worse in cities and forests) |
| Resting heart rate | Trend only | Watch your own week-to-week shift |
| Sleep duration | Trend only | Reliable enough to spot patterns |
| Calorie burn | Low | Do not eat it back |
| Sleep stages | Low | Loose estimate at best |
| Stress / HRV / readiness | Low | A nudge, never a verdict |
The Singapore angle is almost reassuring here. The National Steps Challenge, run through the Health Promotion Board's Healthy 365 app, rewards your daily step count and syncs with HPB-issued trackers plus Fitbit, Garmin, Huawei, Polar and Samsung. It runs on steps, the one metric that is genuinely trustworthy, with the still-arm caveat noted. Swing your arms on the way to the MRT and the count holds up.
Bottom line
Sources
- Stanford Medicine: fitness trackers accurately measure heart rate but not calories burned (Shcherbina et al., 2017)
- The accuracy of Apple Watch measurements: a living systematic review and meta-analysis, npj Digital Medicine (2025)
- Validity of energy expenditure in smartwatches (PMC9549133)
- Accuracy of distance recordings in eight positioning-enabled sport watches (PMC7381051)
- Wrist- vs hip-worn activity monitors when meeting step guidelines, CDC Preventing Chronic Disease (2022)
- Accuracy of three commercial wearable devices for sleep tracking (PMC11511193)
- Accuracy of 11 consumer sleep trackers, JMIR mHealth (PMC10654909)
- Alignment between HRV from fitness trackers and perceived stress, JMIR Human Factors (Hernandez et al., 2022, PMC9389384)
- Validity of heart-rate measurements in wrist monitors across skin tones during exercise, PLOS One (2025)
- HealthHub / Health Promotion Board: National Steps Challenge


