VO2max estimation: non-exercise data, resting heart rate, HRV, sub-maximal heart rate, what's the difference?
Blog post by Marco Altini
Last week we released an update for HRV4Training including VO2max estimation for runners connecting HRV4Training to Strava.
I've been working on this topic for a few years, using wearable sensor's data to estimate VO2max based on sub-maximal HR data, for example HR while walking at a certain speed (see here for two recent publications). My work was mainly targeting the general population, hence sub-maximal HR was captured during low intensity activities such as walking in free-living. Then, machine learning & pattern recognition models were used to recognize such activities and map physiological data to fitness level or VO2max. As VO2max is a good marker of cardiovascular health, and a strong predictor of mortality risk, I believe there is much to gain in bringing these metrics to the general population, unobtrusively and without requiring any specific laboratory tests. Fitbit seems to agree, as they have just released this feature in their latest tracker. By learning more about how cardiorespiratory fitness changes in response to exercise, we can move towards quantifying an important health marker, instead of simply quantifying behavior (e.g. steps, or other metrics currently being used more etc.). Basically, we would close the loop and potentially help individuals taking up a more active lifestyle or simply maintaining it by being able to monitor how changes in activity behavior influence an important health marker such as cardiorespiratory fitness.
This being said, as more products on the market start to provide VO2max estimation, it's important to try to understand how these models work, what are the advantages of models using certain predictors, and what are the general limitations. Following our release, we received a few questions on the accuracy of our models and on how they would compare to other models that for example use resting heart rate or HRV data to estimate VO2max.
Thus, the aim of this post is to explain a bit better how these models work, what is the impact of different predictors (parameters we use to estimate VO2max, for example BMI or resting heart rate), and what are the advantages (and limitations) of the models implemented in HRV4Training.
For an intro to VO2max estimation, check out our previous post, as I will not go over the basics again in this post.
In this post, we will analyze three sets of parameters:
Anthropometrics data only / non-exercise models
Models estimating VO2max using anthropometrics data only have been proposed for many years in research (see Jackson et al., published in 1990, or Baynard et al., just published). The goal is to get to a decent estimate without having to perform any measurement or test. Some of the most recent models actually do include resting heart rate measurements, as anyone can easily collect their resting heart rate, and some other models also include a person's activity level, quantified in different ways (e.g. a number indicating how active you are). For this comparison we will look at anthropometrics data only, as resting physiological data is included in the next section.
Our dataset is about 50 individuals, which is much less than what was available for the studies mentioned above, however we can see very similar results. For example, Baynard reports R2 = 0.22 when including only BMI as predictor, and R2 = 0.57 when adding BMI, age and gender. On our dataset, when replicating the author's work, we get R2 = 0.18 for BMI only and R2 = 0.54 when including BMI, age and gender. Considering that R2 (and any other metric) is highly dependent on the dataset (for example on how much variability we have in the data, for both predictors and predicted variables), these numbers are extremely close. A good starting point for our modeling.
Why these variables? VO2max is known to decrease with age, is lower in women, and also in individuals with higher body fat. As the aim of these models is to be as simple as possible, BMI is typically used as a way to capture body type / fat, even though there are obvious limitations, as BMI does not capture anything related to actual muscle mass / body fat.
Below you can see reference and predicted VO2max when building subject-independent models using anthropometrics data only as predictors. This is how we cross-validate models to make sure they work outside of our sample, basically we use part of the data to train a model, and part of the data for validation. The data used for validation has never been seen by the model, so that we can get realistic estimates of how our model would perform when deployed to new users for whom we have never collected any reference data (click the figure to enlarge).
On the left side we have the linear relation between predicted and estimated VO2max, while on the right side the Bland-Altman plot, showing residuals errors for this model.
Resting physiological data (heart rate and HRV)
Things get more interesting when we start including physiological data. What is the rationale behind including resting heart rate? Physiologically speaking, with a more active lifestyle or more specifically with aerobic training, we have changes in the heart (muscle), resulting in increased stroke volume and reduced heart rate. As heart rate reduces with increased aerobic training, and VO2max / fitness also increases, it makes a lot of sense to use resting heart rate to predict VO2max.
Now the more interesting question is, how much better can we estimate VO2max when including heart rate? If we go back to our previous dataset and include resting heart rate together with BMI, age and gender, we obtain R2 = 0.59, a small but significant improvement compared to the previous R2 = 0.54. The standard error of the estimate goes from 4.8 to 4.6 ml/kg/min. Models including non-exercise parameters combined to physiological data have also been validated in the past, and sometimes showed poor results, see for example Esco et al., however, they do perform better than the previous models including only anthropometrics data.
What about HRV? Adding HRV brings no improvement (same R2 and standard error that we had before including resting HR). As a matter of fact adding HRV and removing resting heart rate also brings no improvements with respect to the original model using only anthropometrics data. This is something I've been arguing for some time, as HRV reflects very well training load and the impact of different stressors, but not necessarily fitness or aerobic capacity. True that some studies showed improvements in baseline HRV for individuals starting an aerobic training plan, however these findings often failed to be replicated (also, typically everything changes when taking inactive people and getting them active, however if we take a group of already active people, then things get more challenging). Additionally, there is so much variability in day to day HRV scores (easily 50% of your baseline or more), that in general I am personally a bit skeptical of any HRV data reported as a single snapshot before / after a study. In my opinion a baseline of at least a week should be collected pre / post study in order to get more confident on an individual's HRV level without being too sensitive to acute variations, otherwise we might just be trying to interpret noise.
Below you can see results for subject independent models using as predictors anthropometrics data and resting HR:
Sub-maximal heart rate data (e.g. heart rate while running)
The rationale behind including sub-maximal HR data is the same as for resting HR data. As we train aerobically and get more fit, sub-maximal HR reduces, meaning that we can for example run at the same speed but with lower heart rate. The reason why we prefer to use sub-maximal HR with respect to resting HR is that these individual differences due to fitness get exacerbated during exercise. Two individuals of quite different fitness level might have a very similar resting HR, say 50 and 55 bpm. However, during the same intense exercise, say running at 12 km/h, the HR of the unfit individual will be much higher (all other things being equal, so similar body size and age, etc.). This is the principle we exploit with our VO2max estimation in HRV4Training, as we capture workouts data from Strava, and can analyze HR at different speeds for a broad set of individuals. Intuitively, the ones that can run faster and keep their HR lower, are most likely the fittest.
Let's include sub-maximal HR in our models. What we get for running HR, even at a speed as low as 8km/h, so definitely an easy effort for most people, is R2 = 0.67 and a standard error of 4.1 ml/kg/min. Much better than before.
Here are the results for the subject-independent analysis, similarly to what we've seen for the other two models:
Highlighting the importance of sub-maximal HR data
After reading above, and looking at the plots, you might be asking yourself if it is really worth it to include all the additional physiological data and context, for relatively small improvements. Correlation in estimated VO2max for subject-independent models go from 0.72 to 0.79. This is a change good enough to publish a paper, but is it really useful to your individual case? Still, much of the variance is not explained by these models (more on this later in the limitations section).
Here I'd like to highlight how including sub-maximal HR is extremely important, and is actually the only way to discriminate between individuals that are similar, which is probably your case if you are an HRV4Training user or simply are into training (hence in the more homogeneous and fit part of the population).
It's always easy to show high correlation or R2 on a dataset with much variability. Say we take thousand of individuals covering a very broad range of BMI and VO2max, from sedentary, obese, unfit individuals, to ironman participants, obviously BMI will be a great predictor of the differences in fitness between these individuals.
But what if we look at similar individuals? People can have similar body size (and age), and yet be extremely different from a cardiorespiratory fitness point of view. Without physiological data, we cannot tell the difference. To highlight this point, I'll isolate a subset of participants with similar characteristics, for example I took individuals aged 21-25 years old and with BMI between 22 and 24 kg/m^2, male only. This is a rather homogenous sample in terms of our predictors. What happens when we try to predict their VO2max using anthropometrics data only?
As highlighted in the figure above, without physiological data we cannot discriminate individuals with different fitness level but similar anthropometrics data. All individuals are predicted at more or less the same VO2max as they are similar according to the model. We need physiological data to be able to discriminate them, as sub-maximal HR will reflect much better their cardiorespiratory fitness level, due to the known relationships explained above. The correlation between estimated and predicted VO2max for this subset of similar individuals is only 0.28, much lower than when we looked at the entire sample.
Let's now look at the same subset of individuals but for our latest model, the one used in HRV4Training, which combines anthropometrics data and HR while running:
We can see now how the same group is predicted much more accurately and we can clearly discriminate between the different fitness level, with one individual clearly being less fit regardless of the low age and BMI.
This is the most accurate model we can develop using anthropometrics data and physiological data during exercise, and a very similar model is currently implemented in HRV4Training.
We can see that much of the variance is still not explained by these models. Possible reasons can be genetics, lack of motivation during the test for some participants, and even how the VO2max test was performed. I will report here part of what I wrote earlier.
Additional limitations are the dependency of the VO2max test on the type of test performed, and body-weight normalizations. While VO2max is the gold standard, and by definition is the only way to determine fitness level, the exercise protocol performed highly influences results. If you do a bike test and a treadmill test, you’ll get two different results. And differences can be big, with running VO2max typically being higher. One of the reasons is that biking tests are often limited by muscle fatigue. However, such tests are the most commonly performed in research, since they are considered more practical and easier for participants that are not used to do sports (e.g. in more medical oriented studies).
One of the major issues with VO2max is the total lack of agreement on body weight normalizations. VO2max is reported most of the times normalized by body weight, however the relation between body weight and oxygen uptake is activity dependent. Again, literature on different normalizations (and allometric coefficients) for activity-specific body weight normalization is inconsistent. Especially when biking, the activity is non-weight bearing, which means the impact of body weight on oxygen uptake is very different compared to weight bearing activities such as running. VO2max categories are based on normalized units (i.e. VO2max/kg), however they don't take into account the type of test performed to obtain VO2max, often over-correcting results. Not-normalizing, while correct in principle, hinders interpretability since tables for different weight ranges don't exist.
This being said, I've eventually decided to stick to VO2max. My main motivations for using VO2max instead of other custom made markers are the following:
Register to the mailing list
and try the HRV4Training app!
1. Intro to HRV
2. How to use HRV, the basics
3. HRV guided training
4. The big picture
5. HRV and training load
6. HRV, strength & power
7. Overview in HRV4Training Pro
8. HRV in team sports
1. Context & Time of the Day
3. Paced breathing
4. Orthostatic Test
5. Slides HRV overview
6. rMSSD vs SDNN
7. Normal values and historical data
1a. Acute Changes in HRV
1b. Acute Changes in HRV (population level)
1c. Acute Changes in HRV & measurement consistency
1d. Acute Changes in HRV in endurance and power sports
2a. Interpreting HRV Trends
2b. HRV Baseline Trends & CV
3. Tags & Correlations
4. Ectopic beats & motion artifacts
5. HRV4Training Insights
6. HRV4Training & Sports Science
7. HRV & fitness / training load
8. HRV & performance
9. VO2max models
10. Repeated HRV measurements
11. VO2max and performance
12. HR, HRV and performance
13. Training intensity & performance
14. Publication: VO2max & running performance
15. Estimating running performance
16. Coefficient of Variation
17. More on CV and the big picture
18. Case study marathon training
19. Case study injury and lifestyle stress
20. HRV and menstrual cycle
21. Cardiac decoupling
22. FTP, lactate threshold, half and full marathon time estimates
23. Training Monotony
Camera & Sensors
1. ECG vs Polar & Mio Alpha
2a. Camera vs Polar
2b. Camera vs Polar iOS10
2c. iPhone 7+ vs Polar
2d. Comparison of PPG sensors
3. Camera measurement guidelines
4. Validation paper
5. Android camera vs Chest strap
6. Zoom HRV vs Polar
7. Apple Watch and HRV
8. Scosche Rhythm24
9. Apple Watch
11. Samsung Galaxy
1. Features and Recovery Points
2. Daily advice
3. HRV4Training insights
4. Sleep tracking
5. Training load analysis
6a. Integration with Strava
6b. Integration with TrainingPeaks
6c. Integration with SportTracks
6d. Integration with Genetrainer
6e. Integration with Apple Health
6f. Integration with Todays Plan
7. HRV4T Coach advanced view
8. Acute HRV changes by sport
9. Remote tags in HRV4T Coach
10. VO2max Estimation
11. Acute stressors analysis
12. Training Polarization
13. Custom desirable range / SWC
14. Lactate Threshold Estimation
15. Functional Threshold Power(FTP) Estimation for cyclists
16. Aerobic Endurance analysis
17. Intervals Analysis
18. Training Planning
19. Integration with Oura
20. Aerobic efficiency and cardiac decoupling
1. HRV normal values
2. HRV by sport
3. HRV normalization by HR
4. HRV 101