Validation of Pattern-Recognition Monitors in Children Using Doubly Labeled Water : Medicine & Science in Sports & Exercise

Journal Logo

EPIDEMIOLOGY

Validation of Pattern-Recognition Monitors in Children Using Doubly Labeled Water

CALABRÓ, MIGUEL ANDRÉS1; STEWART, JEANNE M.2; WELK, GREGORY J.1,2

Author Information
Medicine & Science in Sports & Exercise 45(7):p 1313-1322, July 2013. | DOI: 10.1249/MSS.0b013e31828579c3
  • Free

Abstract

Purpose 

Accurate assessments of physical activity and energy expenditure (EE) are needed to advance research on childhood obesity prevention. The objective of this study is to evaluate the validity of two SenseWear Armband monitors (the Pro3 (SWA) and the recently released Mini) (BodyMedia Inc., Pittsburgh, PA) under free-living conditions in a youth population.

Methods 

Twenty-eight healthy children age 10–16 yr wore both monitors for 14 consecutive days, including sleeping time. Estimates of total EE from the monitors were computed using two different algorithms (version 2.2, available in the SenseWear software 6.1 and 7.0, and the newly developed 5.0 algorithms). The EE estimates were compared with estimates derived from doubly labeled water (DLW) methodology using a three-way mixed model ANOVA (sex × monitor × algorithm), correlation analyses, and Bland–Altman plots.

Results 

The mixed-model ANOVA revealed nonsignificant gender and monitor main effects but a significant main effect for algorithm (P < 0.001). The mean absolute percentage error values were considerably lower with the 5.0 (SWA: 10.9%; Mini: 11.7%) than for the 2.2 algorithm (SWA: 20.7%; Mini: 18.3%). Correlations were high for all comparisons (>0.90), but the Bland–Altman plots revealed consistent bias (greater overestimation at higher EE values). The variance in the differences between methods that was attributable to the mean level of EE ranged from R2 = 0.17 to R2 = 0.44. The magnitude of random error (estimated as the SD of the residuals) ranged from 227 to 299 kcal, but values tended to be lower with the 2.2 algorithm and with the Mini monitor.

Conclusions 

The newly developed SenseWear Armband 5.0 algorithms outperformed the version 2.2 algorithms for group comparisons, but additional work is needed to understand factors contributing to large individual variability.

There is considerable public health interest in assessing and promoting physical activity (PA) in children. Accurate measures are needed to evaluate population patterns and trends, for understanding correlates of activity behavior, and for evaluating health outcomes and interventions. An accurate measurement of daily energy expenditure (EE) under free-living conditions is especially important for understanding how PA contributes to overweight and obesity. Because self-report instruments have acknowledged limitations in youth (29), emphasis has been placed on objective assessment techniques such as the use of accelerometry-based activity monitors. Considerable research has been conducted with a variety of monitors, but research has demonstrated clear limitations associated with the use of standard, uniaxial accelerometers (10,31). Standard accelerometry-based devices work reasonably well for locomotor-based activities but are not well suited to capturing the diverse range of lower intensity lifestyle activities that comprise the bulk of the day. Challenges associated with assessing compliance and in calibrating monitors for different ages have also proven particularly difficult to resolve. The assessment of PA and EE in children is further confounded by variable activity patterns and variability in energy cost of activity due to growth and maturation (17).

Multisensor pattern-recognition monitors offer considerable promise for improving estimates of PA and EE. The combination of multiple sensors allows for the detection of PA patterns and the application of movement-specific algorithms to estimate EE. Pattern-recognition monitors have been shown in some studies to provide more accurate estimates of PA than commonly used accelerometers (11,32).

The SenseWear Pro3 Armband (SWA; BodyMedia Inc., Pittsburgh, PA) is a wireless, multisensor monitor that integrates information from a two-axis accelerometer with a variety of heat-related sensors (i.e., heat flux, skin temperature, near-body ambient temperature, and galvanic skin response). The integration of heat-related sensors allows the SWA to estimate the energy cost of complex movements and upper-body activities that have proven to be difficult to assess with hip-worn accelerometers.

Previous studies have evaluated the validity of the SWA for estimation of EE in children under laboratory conditions (2,3,9,12). However, to date, only one study has evaluated the validity of the SWA in children under free-living conditions (4). In that study, researchers compared total EE (TEE) estimates from two versions of the software of the SWA (Software v.5.1 and v.6.1) with the doubly labeled water (DLW) method (the gold standard for TEE assessment under free-living conditions). Researchers reported that the current software (v.6.1, algorithm version 2.2) improved the accuracy for group level estimation, but individual error was considerable and dependent on PA level (PAL). A newer set of children algorithms (5.0) has recently been released by the manufacturer, but these have not been yet evaluated under free-living conditions. The recently developed SWA Mini (Mini, Software v.7.0, algorithms version 2.2) has also not been evaluated for use in children. A recent DLW study in adults (19) demonstrated that the Mini yielded more accurate estimates of EE than the SWA in adults (possibly because of the use of a three-dimensional accelerometer in the unit). The present study evaluates the validity of the SWA and the Mini for measuring total daily EE in children using a similar approach. The study also directly compares the relative accuracy of the previous algorithms (version 2.2) with the newly developed children algorithms (version 5.0) to determine whether the algorithms have also improved.

METHODS

A sample of 30 healthy youth (age range: 10–16 yr) were recruited to participate in the study. A targeted recruitment process was used to ensure that participants were able to comply with the measurement protocol. The majority of the participants were Caucasian (76.0%) with 14.0% Hispanic and 10.0% Asian. There was a range of body types with approximately 16.7% characterized as “at risk for overweight” (between 85th and 95th percentile), 3.3% characterized as “overweight” (>95th percentile), and 6.7% characterized as “underweight” (<5th percentile). The targeted sampling reduced the generalizability of the sample, but it yielded compliant participants (i.e., internal validity was prioritized over external validity). Approval from the institutional review board was obtained before the beginning of the study to ensure that the procedures were appropriate for the target population. All participants and their parents were informed about the procedures and purposes of the study before parental consent and participant assent were obtained.

Instruments

The SenseWear Pro3 Armband (SWA, Model 908901 PROD2) is a wireless multisensor activity monitor that integrates motion data from two orthogonal accelerometers along with several heat related sensors (heat flux, body temperature, and galvanic skin response). The monitor, worn on the upper arm (right side) over the triceps muscle, is lightweight (79 g) and comfortable to wear. The SWA has been previously validated in adults (19,30) and children (4) under free-living conditions. The data in the present study were processed using the latest proprietary algorithms available in the software (Research Software 6.1, algorithms 2.2).

The SenseWear Mini (Mini, Model MS-SW) is a newer and smaller version of the SWA that is worn on the left arm. The Mini operates in a similar manner as the more established SWA but includes a triaxial accelerometer instead of a two-axis accelerometer. Data obtained from the Mini was processed using the same proprietary algorithms, but from a different software package (Research Software 7.0, algorithms 2.2).

Data Collection Procedures

On the first day of the study (day 0), participants reported to the campus research center after a 10-h overnight fast (no food or drink other than water) and after collecting a baseline urine sample (baseline A). Participants then provided a second baseline urine sample upon arrival (baseline B), and standard anthropometric data were collected. Standing and sitting height were measured to the nearest 0.1 cm with the use of a wall-mounted Ayrton stadiometer (Prior Lake, MN) and with the participants barefoot. Body mass was measured with participants in light clothes and barefoot on a Cardinal Detecto electronic scale (Webb City, MO) to the nearest 0.1 kg. The body mass index was calculated as weight (kg)/height2 (m2). Sitting height data (27) were used to predict age at peak height velocity using equations developed by Mirwald et al. (23).

The two monitors were initialized using the participant’s personal information (age, sex, height, and weight) and adjusted to fit on the participants arms. The SWA monitor was placed on the right arm while the Mini monitor was placed on the left arm, following manufacturer recommendations. After fitting both activity monitors, the DLW dose was administered to the participant. The dosage was determined on the basis of body weight (1.5 g per kilogram of body weight) in accordance with the protocol. Regular measured water was provided to clean the drinking container and ensure that all the “heavy water” was consumed by the participant.

The DLW procedure involved collection and processing of urine samples on days 0, 7, and 14. On day 0, participants provided urine samples at 1.5, 3.0, 4.5, and 6 h and returned samples to the research center later in the day. Participants were provided with a cooler bag containing prelabeled 60-mL sterile cups and were asked to provide at least 40 mL in each sample. Participants were given a liter of fresh drinking water immediately upon administration of the dose and were encouraged to drink the water over the course of the morning to ensure adequate urine volume.

On days 7 and 14, participants reported to the laboratory after a 10-h fast and were asked to provide additional urine samples at two time points (90 min apart). Body weight was measured on both days to check for changes in weight over the course of the study. Resting metabolic rate (RMR) data were obtained on day 7 between the collection of the two urine samples. These measurements were obtained in an isolated dark room, with participants awake, in a reclined position, avoiding speaking and minimizing their movement, while watching a movie, using a metabolic measurement system (True One 2400®; Parvo-Medics Inc., Sandy, UT). Pediatric size masks (Hans Rudolph Inc., Kansas City, MO) were fitted to the participants and properly adjusted before data collection. The RMR values were computed on a per minute basis and expressed per day to facilitate analyses. The first 10 min of data collected were discarded and the last 15 min averaged to obtain an estimate of 24-h RMR. The temperature of the room was maintained at 22°C, and the metabolic analyzer was calibrated before every measurement for pressure and gas concentrations.

All urine samples were processed by the same research technician using standardized procedures. The specimens were labeled and coded by time to ensure accurate processing of the data. Duplicate urine samples of approximately 12 mL were stored frozen and later sent for processing at the Pennington Biomedical Research Center (Baton Rouge, LA).

TEE was determined from DLW over a 14-d period by tracking the relative loss of the labeled isotopes (2H deuterium and 18O) in the water. The difference between the rates of disappearance of the isotopes reflects the total carbon dioxide (CO2) production over the measured period and is calculated from the slope of the elimination curve. The rates of disappearance were determined from the multiple urine samples obtained throughout the protocol. A fixed respiratory quotient of 0.86 was used to establish oxygen consumption and to obtain a value for TEE over the 14 d. Values were expressed per day to facilitate interpretation. Additional information about the DLW protocol and processing is published elsewhere (19).

Processing of SWA and minidata.

Participants in the study wore both devices simultaneously for the entire 14-d period (with the exception of showering time). During visits to the laboratory on days 7 and 14 of the protocol, both monitors were downloaded and the batteries were recharged. The SWA and Mini files were processed with the latest version of the algorithms available for the latest software package (algorithms 2.2). The raw SWA and Mini files were also sent to the manufacturer (BodyMedia Inc.) to be processed with newly developed children algorithms (v.5.0).

During their visits on days 7 and 14, participants reported nonwearing periods (i.e., showering or dressing). Individual attention was given to each data file to control for possible gaps in the data during the monitoring period. Active nonwearing periods were compared with the reported nonwearing periods to account for possible gaps in the data. Those gaps were manually filled with corresponding MET values on the basis of established (compendium) values of EE for youth developed by Ridley et al. (26). For example, data gaps attributable to showering were manually filled with a corresponding MET equivalent for “showering and toweling off” (2.0 METs) on the basis of the compendium. Other unexplained gaps shorter than 10 min (commonly occurring due to a loose monitor strap) were filled with average EE of the 10 min before the gap. A group of four participants were involved in a volleyball league that prevented them from using the monitors during tournament games. Those individuals recorded their playing time during volleyball practices with the activity monitors, and mean values of those periods were used to fill gaps produced during the tournament games.

Statistical Analyses

Descriptive statistics (sample mean and SD) were computed for the primary PA categories to describe the characteristics of the participants and their activity profiles. Data were checked for normality to ensure that the distribution of the data would not influence the results. Activity EE (AEE) and PAL were calculated using the following equations: AEE = (0.9 TEE) − RMR (assuming thermic effect of food to be 10% of TEE) and PAL = TEE/RMR, respectively.

The study evaluated the differences between estimates of TEE and PAEE from the SWA and the Mini compared with criterion estimates from the DLW. Primary statistical analyses were performed using SAS 9.2. Mixed-model ANOVA was used to account for the possible correlation across repeated observations taken on the same individuals in the study. The models used participant within sex as person-level random-effect term and the residual variance as a second random-effect term. These analyses assume a common variance for between-person and within-person random effects. The fixed effects included in the models for both TEE and AEE were sex, monitor (SWA and Mini), and algorithm (v.2.2 and v.5.0), as well as the corresponding two- and three-way interactions. F tests were used to determine whether factors were statistically significant. Tukey–Kramer paired comparisons tests were used to test for differences among levels of fixed effects. Least squares means (LSM) and SE were estimated in the model and included in the tables.

Pearson product–moment correlations were computed to evaluate the strength of the linear relationship between the monitors and the DLW method in measures of TEE and AEE. Bland–Altman graphical procedures (6) were also used to examine agreement across the range of TEE and AEE values and evaluate the presence of proportional systematic bias. The average between the monitor estimates and the DLW values were plotted in the x-axis, and the residuals between the estimates (i.e., DLW minus SWA) were plotted in the y-axis. A regression approach for nonuniform differences was applied as suggested by Bland and Altman (7), to provide 95% limits of agreement (LOA) and estimates of the magnitude of the systematic and random error. Differences were considered significant at P < 0.05.

RESULTS

The main objective of the study was to compare measures of TEE between two activity monitors (Mini and SWA) and the DLW method, in a diverse sample of youth (age range, 10–16 yr). Thirty participants completed the study, but data from two participants had to be discarded from the analyses because of unusable DLW values (urine collection problems). Three trials with the SWA monitor yielded data abnormalities during downloading, so those SWA trials were not included in the analyses. An extreme outlier was also noted in the examination of data from a single trial with the Mini5.0 algorithm. This was attributed to a data entry error in the processing because the estimated EE was considerably higher than any other value processed with this algorithm. The final analyses include data from 28 participants based on DLW and the Mini monitor (27 with v.5.0) and data from 25 participants using the SWA monitor.

On average, participants wore the monitors for 96.7% (±3.0) of the time during the 14 d of monitoring. The participant’s wearing percentages ranged from 85.6% to 99.9% of protocol time. The total off-body time of monitoring per day was 48.0 min (±40.2) and ranged from 2 to 153 min. Measured RMR for the sample was 1460.9 (±367.3) kcal·d−1, whereas the PAL was 1.81 (±0.33). Descriptive statistics for the sample population are provided in Table 1. Pearson product–moment correlations between the DLW method TEE and the monitors were consistently high for both monitors, regardless of which algorithms were being used (correlation coefficients (r) with DLW were 0.93, 0.92, 0.90, and 0.90 for the SWA2.2, Mini2.2, SWA5.0, and Mini5.0, respectively).

T1-13
TABLE 1:
Descriptive statistics—Means ± SD.

TEE comparisons.

Table 2 includes LSM and SE values for estimates of TEE (daily values) from DLW and the two armband monitors for both the currently available monitor algorithms (panel A) and the newly developed algorithms (panel B). The main effect for monitor was not significant (F = 0.07, P = 0.79). This result collapses the estimates from the two algorithms and suggests both monitors provide similar estimates independent of the algorithm version. The main effect for algorithm was highly significant (F = 37.9, P < 0.0001) with considerable differences reported between algorithms (512.7 kcal·d−1). The difference was amplified because the version 2.2 algorithms tended to underestimate EE relative to the DLW method (average difference = 430 kcal·d−1), whereas the version 5.0 algorithms tended to overestimate EE relative to the DLW method (average difference = 83 kcal·d−1). The estimates of TEE from the version 2.2 algorithms were significantly different (P < 0.001) from the DLW values for both the SWA (−490.6 kcal·d−1 (19.6%)) and the Mini (−369.8 kcal·d−1 (14.8%)). The mean absolute percentage error (MAPE) values were 20.7% and 18.3% for the SWA and Mini, respectively. The corresponding TEE estimates produced from the version 5.0 algorithms were considerably more accurate (relative to the DLW values). The differences in estimates of TEE (compared with the DLW values) were not statistically significant for both the SWA (120.7 kcal·d−1 (4.8%), P = 0.20) and the Mini (44.4 kcal·d−1 (1.7%), P = 0.6). The MAPE values were similar for both the SWA and Mini (10.9% and 11.7%, respectively).

T2-13
TABLE 2:
TEE LSM and SE values from the activity monitors (by algorithm) and DLW method.

The inclusion of sex in the mixed model analyses made it possible to directly compare the differences between genders. Nonsignificant interactions were observed with gender, but the estimates with the version 5.0 algorithm tended to be more accurate in girls than in boys. The differences in estimated TEE for girls were smaller for the SWA (8.0 kcal·d−1 (0.3%)) than the Mini (−96.9 kcal·d−1 (3.8%)), but neither were statistically significant. The differences for boys were larger for the SWA (−233.4 kcal·d−1 (8.4%)) than the Mini (8.1 kcal·d−1 (0.03%)) but nonsignificant in both cases.

Bland–Altman plots for TEE were used to assess proportional systematic bias between the DLW and the monitor estimates (Fig. 1). The mean TEE between methods was plotted in the x-axis, whereas the differences between the monitors/algorithms and the DLW method were plotted in the y-axis. Overall, the 95% LOA was smaller for the version 2.2 algorithms (equations: SWA: (926.5 − 0.19 × average TEE) ± (1.96 × 248.9); Mini: (1156.6 − 0.33 × average TEE) ± (1.96 × 226.2)) compared with the version 5.0 algorithms (equations: SWA: (884.3 − 0.38 × average TEE) ± (1.96 × 299.7); Mini: (680.3954 − 0.32 × average TEE) ± (1.96 × 241.4)).

F1-13
FIGURE 1:
Bland–Altman plots of TEE estimated with the DLW method and estimates from SenseWear Armband monitors (kcal·d−1).

The plots showed a consistent underestimation of TEE for the version 2.2 algorithms, with the SWA underestimating TEE for 24 of the 25 participants (96.0%) and the Mini underestimating TEE for 25 of the 28 participants (89.3%). The plots for the version 5.0 algorithms were more balanced but showed a general tendency for overestimation of TEE. The SWA overestimated TEE for 15 of 25 participants (60%), whereas the Mini overestimated TEE for 20 of the 28 participants (71.4%). The magnitude of random error (estimated as the SD of the residuals) were similar for all of the plots but tended to be smaller for the v.2.2 algorithm (compared with v5.0) and smaller for the Mini (compared with the SWA): Mini2.2 (227 kcal), SWA2.2 (249 kcal), Mini5.0 (241 kcal), and SWA5.0 (299 kcal). All TEE plots showed bias with overestimation larger for individuals with greater TEE values (Fig. 1). The proportion of variability in the differences between methods that was attributable to the mean level of EE (estimated from the R2 value) ranged from 0.17 (SWA2.2) to 0.44 (Mini2.2). The slope of the regression model was significantly different from zero in all cases (SWA2.2: slope = −0.17, P = 0.026; SWA5.0: slope = −0.39, P = 0.0003; Mini2.2: slope = −0.33, P = 0.0002; Mini5.0: slope = −0.32, P = 0.00006). This indicates that the differences between the monitors and the DLW method depend on the average estimated EE. Furthermore, the intercepts were also significantly different from zero, which would indicate the presence of systematic bias independent of the level of EE even if the slopes were not significantly different from zero.

AEE.

Table 3 includes LSM and SE values for AEE (daily values) for DLW, as well as Armband monitor data for the currently available monitor algorithms (A) and the newly developed algorithms (B). The results for AEE were very similar to those reported for the TEE because AEE is a variable component of EE that is more likely to be influenced by differences in monitors or algorithms. Pearson product–moment correlations between the DLW-derived AEE values and the monitor estimates were somewhat lower than for TEE, but still high for both monitors, and with both algorithms (correlation coefficients (r) with DLW values were 0.79, 0.83, 0.73, and 0.75 for the SWA2.2, Mini2.2, SWA5.0, and Mini5.0, respectively).

T3-13
TABLE 3:
AEE LSM and SE values from the activity monitors and DLW method.

Differences in AEE were significant (P < 0.001) when comparing the DLW method with the version 2.2 algorithms (−458.7 kcal·d−1 (53.3%) and −332.8 kcal·d−1 (38.7%) for the SWA and Mini, respectively). Mean absolute error values were 61.9% and 53.3% for the SWA and Mini, respectively. The differences in AEE estimates with the version 5.0 algorithms were considerably smaller for both the SWA (91.4 kcal·d−1 (10.6%)) and the Mini (131.1 kcal·d−1 (15.2%)). The differences between the AEE estimates from the SWA and Mini and the criterion (DLW) values AEE were nonsignificant. Mean absolute error values for the SWA and Mini were considerably lower with the 5.0 algorithms (30.0% and 28.9%, respectively). Bland–Altman plots for AEE estimates were used to assess proportional systematic bias (Fig. 2) and showed similar trends as the TEE plots, with a similar form of proportional systematic bias (tendency for greater overestimation for individuals with higher activity levels).

F2-13
FIGURE 2:
Bland–Altman plots of AEE estimated with the DLW method and estimates from SenseWear Armband monitors (kcal·d−1).

DISCUSSION

The study evaluated the validity of both the SWA and the more recently released Mini monitor in youth, using DLW as the reference method. A unique aspect of the study is that we directly compared results from the older estimation algorithms (v.2.2) against values from the newer version (v.5.0). The direct comparison of two different monitors and two different algorithms against the same standard makes it possible to directly determine the relative effects of changes in technology and changes in algorithms. No significant differences were evident for the comparisons between monitors, which indicate that the monitors provide similar estimates when a given algorithm is used. This is important information for research groups that may want to make comparisons between the old and the new monitors. The results, however, reveal a large and significant main effect for algorithm. In this case, the newly developed 5.0 algorithms yielded more accurate group estimates of TEE than the version 2.2 algorithms—regardless of which monitor was used. The differences in TEE estimates were significant for the 2.2 algorithms but not for the version 5.0 algorithms.

Significance tests have limited value when interpreting the actual amount of error in estimates so we focused the attention on other indicators of individual agreement. The MAPE provides a good indicator of overall error because it represents the percentage error that can be expected (on average) for this type of assessment. The MAPE values were considerably smaller for the version 5.0 algorithms (10%–12%) compared with the values for the version 2.2 algorithms (18%–20%). The values observed with the 5.0 algorithm are actually very close to the values reported by our group in a similar study in adults (19). In that study, we observed MAPE of approximately 8% for both monitors.

The Bland–Altman plots provided a visual examination of the error across the range of EE values. These plots revealed a tendency for proportional systematic bias for both monitors and with both algorithms. In all cases, there was a tendency for larger overestimation for higher EE levels. The plots also revealed a tendency for systematic bias that was independent of the level of EE because the intercept of the regression lines were also significant for all monitor/algorithms.

The LOA also reveals a fairly large individual (random) error for the measurements. The 95% LOA for the Bland–Altman plots showed that any given value of the monitor could fall on average in a span ranging from 887 kcal·d−1 (Mini2.2) to 1175 kcal·d−1 (SWA5.0). For example, a participant’s reading for TEE of 2500 kcal·d−1 with the Mini2.2 could appear as 2065 to 2935 kcal·d−1. Although the MAPE values were lower for the version 5.0 algorithms, the variance attributable to systematic bias and the magnitude of random error were similar for the various monitors and algorithm comparisons. This suggests that the improvements in accuracy for group estimation for the new 5.0 algorithms were not accompanied by improvements in individual agreement. These somewhat conflicting findings point out the importance of evaluating the magnitude and source of error in EE estimates.

Previous studies using the SWA in children have reported some similarly discrepant findings related to TEE estimation, but these studies used software versions 5.1 or 6.1, which use the earlier version of the prediction algorithms (version 2.2). Arvidsson et al. (2) assessed the validity of EE estimates from the SWA in 20 healthy children (ages 11–13 yr) for a wide range of activities using indirect calorimetry as the reference method. They reported a significant underestimation in AEE by the SWA (v.5.1) for most activities (approximately 22% on average). A later study by the same laboratory (3), using a comparable sample of children (mean age: 12.3 yr) and the same method (indirect calorimetry), reported underestimations in AEE of 18% for the SWA. The researchers in this study noted that the underestimation in AEE increased with the increasing intensity in the activities. Contrasting findings were reported for the SWA (v.5.1) by Dorminy et al. (12) in a sample of youth (ages 10–14 yr) monitored with indirect room calorimetry over a 24-h period. Researchers reported a consistent overestimation of EE for a variety of tasks (treadmill exercise, stationary biking, treadmill walking, sedentary activities) measured by whole-room indirect calorimetry. The overall error (overestimation) in TEE for the 24 h of monitoring was 22%. We reported a similar tendency for overestimation of EE (32%) with version 5.1 of the SWA software in a laboratory study in young children (ages 7–11 yr) (9). However, nonsignificant differences in EE were observed when we used the more recent version (6.1) of the software (average group level error of 1.7%).

The present study was designed to help resolve some of these discrepant findings. The recently developed algorithms (5.0) clearly outperformed the version 2.2 algorithms that were used in all of the previous studies mentioned above. This proved to be true for both the SWA and the Mini. A previous study in adults (19) demonstrated some improved accuracy for the Mini relative to the SWA, but in this study, we observed similar MAPE for both monitors as long as the same algorithm was used.

The results of the present study are similar to findings of another DLW study (4), which demonstrated clear improvements with the version 6.1 algorithms compared with the previously available software version (5.1). Researchers reported a significant mean overestimation (8.3%, P < 0.01) with the older version of the software and an improved nonsignificant estimation difference (6.0%) with the newer software version (v.6.1). Consistent with our findings, the researchers reported high Pearson product–moment correlations between the SWA and the DLW method for both software versions (r = 0.79 and r = 0.74 for the v.5.1 and v.6.1, respectively). This study also reported a similar type of proportional systematic bias, in which the error was dependent on PAL. In the current study, the larger differences in TEE estimation when comparing the same software version (6.1) used in Arvidsson’s study (4) can be attributed, in part, to a larger age range and more variability in body size and maturation. In concordance with Arvidsson’s study, we observed significant improvements in TEE estimation with the more recently developed algorithms, similar magnitude of correlation coefficient for both monitors and algorithms, and a similar form of proportional systematic bias observed with increased EE. The proportional systematic bias reported by Arvidsson et al. (4) was also observed in the current study—suggesting a possible limitation of the SWA monitor for assessing EE at higher intensities. This limitation was also reported in two studies of highly trained athletes (13,20).

The continued release of newly developed SWA software makes it difficult to compare results across studies; however, it seems reasonable to assume that the latest software version maintains the positive characteristics observed in preceding studies, while incorporating additional capabilities to the device. As described, the results in our study show that the newly developed algorithms (v.5.0) provide more accurate estimates of TEE than the previous algorithms (v.2.2). However, a similar DLW study on a slightly younger (ages 8–11 yr) sample of overweight and obese children (5) found slightly worse results when comparing two different releases. They reported nonsignificant differences (<1% difference) in group level TEE with the SWA when using the older software version (5.1, algorithms v.2.1) but a significant difference (18% underestimation) when using the newer software version (6.1, algorithms v.2.2). However, this is likely due to the unique nature of the sample population (i.e., overweight participants). Correcting the underestimation in EE for overweight individuals was one of the major changes in the most recent version 5 algorithms. Although differences in algorithms complicate the evaluation of data with the BodyMedia products, the systematic and progressive effort to improve the precision of the estimates is a unique and desirable feature of this particular monitor.

Direct comparisons with DLW-based validation studies of other monitors is complicated because most have not reported mean absolute error rates (1,14,16,18,24) or the magnitude of systematic bias. Studies with other monitors have generally shown significant correlations between accelerometer counts, TEE, and AEE, but validity cannot be readily determined without a direct assessment of the differences in estimates. A recent study by Krishnaveni et al. (21) evaluated the validity of accelerometry-based activity monitors in a sample of Indian children using the DLW method. The results revealed low correlations between the monitor and DLW technique for both TEE (r = 0.33) and PAL (r = 0.17), using the Ekelund et al. (15) prediction method. For comparison, the correlations in the present study were considerably higher for TEE (range: r = 0.90–0.93) and AEE (range: r = 0.73–0.83). The LOA from the Bland–Altman plots in the other study (21) was 0.81 MJ·d−1 for girls and 1.33 MJ·d−1 for boys. These values are about 50% larger than the present study. It is interesting to note, however, that the Krishnaveni study reported similar patterns of bias across the range of EE values. In a study by Ekelund et al. (14), researchers used a sample of adolescents (age range: 14–19 yr, n = 13) to develop a children-specific predictive equation using accelerometry-based monitors, with the DLW method as the criterion method. Researchers reported low group TEE differences (approximately 3%), but the authors did not report MAPE rates. This limitation must be considered when interpreting this study because it is possible for group error to be low while still having large error around the mean. The LOA was similar to the present study, but the correlations between measures were lower (r = 0.60). It is important to note that the sample was considerably older than the present study, and this further complicates the comparisons.

While estimates of TEE were reasonable in the present study, we observed considerable error in the estimates of AEE. AEE is the most variable component of total daily EE (22) and includes a wide variety of activities of daily living. The diverse and variable nature of lifestyles makes it very difficult to accurately assess AEE. A study by Nilsson et al. (25) showed large variation in estimation (ranging from 83% overestimation to 46% underestimation) depending on the type of equation used and how the equation was developed (i.e., sample used, laboratory conditions vs free living). A study by Ekelund et al. (15) showed that sex and physical characteristic variables (height, weight, and fat-free mass) contribute to the variability and error in estimates of AEE. The availability of multisensors and the use of complex pattern-recognition technology in the SWA and Mini armband monitors would seem to offer advantages for potentially improving the precision of AEE estimates.

Recent studies by Butte et al. (8) and Zakeri et al. (33) have demonstrated the advantages of using mathematical models and pattern-recognition strategies for improving assessments of PA. In these studies the researchers used cross-sectional time series (CSTS) and multivariate adaptive regression spline (MARS) models to predict TEE from accelerometer and heart rate data. Researchers reported low mean absolute errors (approximately 1%) for both CSTS and MARS models when compared with DLW. The root mean square error values were 305 and 251 kcal·d−1 for CSTS and MARS models, respectively (8). The corresponding values for the 5.0 algorithms residuals in the present study were similar (241 and 299 kcal·d−1 for the Mini and SWA, respectively) and also for 2.2 algorithms (227 and 249 kcal·d−1 for the Mini and SWA, respectively), but it is not clear if the models developed by Butte et al. (8) and Zakeri et al. (33) can be applied more generally for assessing free living populations. An advantage of the armband algorithms is that it is integrated into the software and easy to use for field-based research.

The present study provides evidence that the SWA and Mini provide reasonably accurate estimates of TEE in young children. However, it is not possible to state unequivocally whether the accuracy is acceptable or not because precision may be more important for some studies than others. Researchers are encouraged to consider the precision of any monitor relative to other competing technologies and to provide indicators of agreement (e.g., MAPE) that enable more direct comparisons. The present study supports the utility of the SWA for use in young children, but it is important to consider the relative strengths and limitations of the study. The use of the established DLW method is a strength, but it is important to point out that even this method has error. For example, the DLW method has been shown to have a reliability of approximately 8%, with physiologic variation accounting for about approximately 6% and analytic variation contributing the remainder (28). The potential for error in the criterion method must be considered when interpreting the results of any monitor validation study including this one. The use of a convenience sample for the study is another acknowledged limitation of our design. The study protocol required that we recruit responsible and reliable participants; therefore, as previously stated, internal validity was weighed more heavily than external validity. Because these participants may differ from the general population, the findings may also not be generalizable.

In conclusion, the newly developed SenseWear 5.0 algorithms outperformed the version 2.2 algorithms for group comparisons. However, there is still considerable random error associated with the estimate of free-living EE. This, of course, is not unique to the SWA and Mini monitors. Future research with activity monitoring devices such as the SWA and Mini should focus on understanding factors contributing to large individual variability.

The authors would like to gratefully acknowledge the enthusiastic support of the volunteers who participated in this study. In addition, the authors would like to acknowledge the important contributions of Jungmin Lee and Pedro St-Maurice (Kinesiology Department, Iowa State University) in the data collection stages of the project and Dr. Alicia Carriquiry and her team for their contribution to this study. The authors also acknowledge the helpful comments from the reviewers of the article.

This research was funded by a grant from BodyMedia Inc. awarded to Dr. Greg Welk.

No conflict of interest for any of the authors was declared.

The results of the present study do not constitute endorsement by the American College of Sports Medicine.

REFERENCES

1. Abbott RA, Davies PSW. Habitual physical activity and physical activity intensity: their relation to body composition in 5.0–10.5-y-old children. Eur J Clin Nutr. 2004; 58: 285–91.
2. Arvidsson D, Slinde F, Hulthen L. Free-living energy expenditure in children using multi-sensor activity monitors. Clin Nutr. 2009; 28: 305–12.
3. Arvidsson D, Slinde F, Larsson S, Hulthen L. Energy cost of physical activities in children: validation of SenseWear Armband. Med Sci Sports Exerc. 2007; 39 (11): 2076–84.
4. Arvidsson D, Slinde F, Larsson S, Hulthén L. Energy cost in children assessed by multisensor activity monitors. Med Sci Sports Exerc. 2009; 41 (3): 603–11.
5. Bäcklund C, Sundelin G, Larsson C. Validity of armband measuring energy expenditure in overweight and obese children. Med Sci Sports Exerc. 2010; 42 (6): 1154–61.
6. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1: 307–10.
7. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999; 8 (2): 135–60.
8. Butte NF, Wong WW, Adolph AL, Puyau MR, Vohra FA, Zakeri IF. Validation of cross-sectional time series and multivariate adaptive regression splines models for the prediction of energy expenditure in children and adolescents using doubly labeled water. J Nutr. 2010; 140 (8): 1516–23.
9. Calabro MA, Welk GJ, Eisenmann JC. Validation of the SenseWear Pro Armband algorithms in children. Med Sci Sports Exerc. 2009; 41 (9): 1714–20.
10. Chen KY, Bassett DR Jr. The technology of accelerometry-based activity monitors: current and future. Med Sci Sports Exerc. 2005; 37(Suppl 11): S490–500.
11. Corder K, Brage S, Mattocks C, et al. Comparison of two methods to assess PAEE during six activities in children. Med Sci Sports Exerc. 2007; 39 (12): 2180–8.
12. Dorminy CA, Choi L, Akohoue SA, Chen KY, Buchowski MS. Validity of a multisensor armband in estimating 24-h energy expenditure in children. Med Sci Sports Exerc. 2008; 40 (4): 699–706.
13. Drenowatz C, Eisenmann JC. Validation of the SenseWear Armband at high intensity exercise. Eur J Appl Physiol. 2011; 111 (5): 883–7.
14. Ekelund U, Aman J, Westerterp K. Is the ArteACC index a valid indicator of free-living physical activity in adolescents? Obes Res. 2003; 11: 793–801.
15. Ekelund U, Sjöström M, Yngve A, et al. Physical activity assessed by activity monitor and doubly labeled water in children. Med Sci Sports Exerc. 2001; 33 (2): 275–81.
16. Ekelund U, Yngve A, Brage S, Westerterp K, Sjöström M. Body movement and physical activity energy expenditure in children and adolescents: how to adjust for differences in body size and age. Am J Clin Nutr. 2004; 79: 851–6.
17. Harrell JS, McMurray RG, Baggett CD, Pennell ML, Pearce PF, Bangdiwala SI. Energy costs of physical activities in children and adolescents. Med Sci Sports Exerc. 2005; 37 (2): 329–36.
18. Hoos MB, Plasqui G, Gerver WJ, Westerterp KR. Physical activity level measured by doubly labeled water and accelerometry in children. Eur J Appl Physiol. 2003; 89: 624–6.
19. Johannsen DL, Calabro MA, Stewart J, Franke W, Rood JC, Welk GJ. Accuracy of armband monitors for measuring daily energy expenditure in healthy adults. Med Sci Sports Exerc. 2010; 42 (11): 2134–40.
20. Koehler K, Braun H, de Marées M, Fusch G, Fusch C, Schaenzer W. Assessing energy expenditure in male endurance athletes: validity of the SenseWear Armband. Med Sci Sports Exerc. 2011; 43 (7): 1328–33.
21. Krishnaveni GV, Veena SR, Kuriyan R, et al. Relationship between physical activity measured using accelerometers and energy expenditure measured using doubly labelled water in Indian children. Eur J Clin Nutr. 2009; 63: (11): 1313–9.
22. Levine JA. Nonexercise activity thermogenesis (NEAT): environment and biology. Am J Physiol Endocrinol Metab. 2004; 286: E675–85.
23. Mirwald RL, Baxter-Jones ADG, Bailey DA, Beunen GP. An assessment of maturity from anthropometric measurements. Med Sci Sports Exerc. 2002; 34 (4): 689–94.
24. Montgomery C, Reilly JJ, Jackson DM, et al. Relation between physical activity and energy expenditure in a representative sample of young children. Am J Clin Nutr. 2004; 80: 591–6.
25. Nilsson A, Brage S, Riddoch C, et al. Comparison of equations for predicting energy expenditure from accelerometer counts in children. Scand J Med Sci Sports. 2008; 18 (5): 643–50.
26. Ridley K, Ainsworth BE, Olds TS. Development of a compendium of energy expenditures for youth. Int J Behav Nutr Phys Act. 2008; 5: 45.
27. Ross WD, Marfell-Jones MJ. Kinanthropometry. In: MacDougall JD, Wenger HA, Green HJ, editors. Physiological Testing of the High-Performance Athlete. Champaign (IL): Human Kinetics Books; 1991. p. 223–308.
28. Schoeller DA, Hnilicka JM. Reliability of the doubly labeled water method for the measurement of total daily energy expenditure in free-living subjects. J Nutr. 1996; 126: 348S–54S.
29. Slootmaker SM, Schuit AJ, Chinapaw MJM, Seidell JC, van Mechelen W. Disagreement in physical activity assessed by accelerometer and self-report in subgroups of age, gender, education and weight status. Int J Behav Nutr Phys Act. 2009; 6: 17.
30. St-Onge M, Mignault D, Allison DB, Rabasa-Lhoret R. Evaluation of a portable device to measure daily energy expenditure in free-living adults. Am J Clin Nutr. 2007; 85 (3): 742–9.
31. Trost SG, McIver KL, Pate RR. Conducting accelerometer-based activity assessments in field-based research. Med Sci Sports Exerc. 2005; 37 (11 Suppl): S531–43.
32. Welk GJ, McClain JJ, Eisenmann JC, Wickel EE. Field validation of the MTI ActiGraph and BodyMedia armband monitor using the IDEEA monitor. Obesity (Silver Spring). 2007; 15 (4): 918.
33. Zakeri IF, Adolph AL, Puyau MR, Vohra FA, Butte NF. Multivariate adaptive regression splines models for the prediction of energy expenditure in children and adolescents. J Appl Physiol. 2010; 108 (1): 128–36.
Keywords:

ENERGY EXPENDITURE; PHYSICAL ACTIVITY; ACTIVITY MONITOR; FREE-LIVING CONDITIONS

© 2013 American College of Sports Medicine