Measurement error in two-stage analyses, with application to air pollution epidemiology
Corresponding Author
Adam A. Szpiro
Department of Biostatistics, University of Washington, Seattle, WA, 98195 U.S.A.
Correspondence to: Adam A. Szpiro, Department of Biostatistics, University of Washington, Seattle, WA 98195, U.S.A. E-mail: [email protected]
Search for more papers by this authorChristopher J. Paciorek
Department of Statistics, University of California, Berkeley, CA, 94720 U.S.A.
Search for more papers by this authorCorresponding Author
Adam A. Szpiro
Department of Biostatistics, University of Washington, Seattle, WA, 98195 U.S.A.
Correspondence to: Adam A. Szpiro, Department of Biostatistics, University of Washington, Seattle, WA 98195, U.S.A. E-mail: [email protected]
Search for more papers by this authorChristopher J. Paciorek
Department of Statistics, University of California, Berkeley, CA, 94720 U.S.A.
Search for more papers by this authorAbstract
Public health researchers often estimate health effects of exposures (e.g., pollution, diet, and lifestyle) that cannot be directly measured for study subjects. A common strategy in environmental epidemiology is to use a first-stage (exposure) model to estimate the exposure on the basis of covariates and/or spatiotemporal proximity and to use predictions from the exposure model as the covariate of interest in the second-stage (health) model. This induces a complex form of measurement error. We propose an analytical framework and methodology that is robust to misspecification of the first-stage model and provides valid inference for the second-stage model parameter of interest.
We decompose the measurement error into components analogous to classical and Berkson errors and characterize properties of the estimator in the second-stage model if the first-stage model predictions are plugged in without correction. Specifically, we derive conditions for compatibility between the first-stage and second-stage models that guarantee consistency (and have direct and important real-world design implications), and we derive an asymptotic estimate of finite-sample bias when the compatibility conditions are satisfied. We propose a methodology that does the following: (i) corrects for finite-sample bias; and (ii) correctly estimates standard errors. We demonstrate the utility of our methodology in simulations and an example from air pollution epidemiology. Copyright © 2013 John Wiley & Sons, Ltd.
REFERENCES
- Adar SD, Klein R, Klein BEK, Szpiro AA, Cotch MF, Wong TY, O'Neill MS, Shrager S, Barr RG, Siscovick DS, Daviglus ML, Sampson PD, Kaufman JD. 2010. Air pollution and the microvasculature: a cross-sectional assessment of in vivo retinal images in the population-based Multi-Ethnic Study of Atherosclerosis (MESA). PLoS Medicine 7(11): e1000372.
- Banerjee S, Carlin BP, Gelfand AE. 2004. Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall, CRC: Boca Raton, FL.
- Bennett J, Wakefield J. 2001. Errors-in-variables in joint population pharmacokinetic/pharmacodynamic modeling. Biometrics 57(3): 803–812.
- Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, Greenland P, Jacobs DR. Jr., Kronmal R, Liu K, Nelson JC, O'Leary D, Saad MF, Shea S, Szklo M, Tracy RP. 2002. Multi-ethnic study of atherosclerosis: objectives and design. American Journal of Epidemiology 156(9): 871.
- Brauer M. 2010. How much, how long, what, and where: air pollution exposure assessment for epidemiologic studies of respiratory disease. Proceedings of the American Thoracic Society 7: 111–115.
- Buja A, Berk R, Brown L, George E, Pitkin E, Traskin M, Zhang K, Zhao L. 2013. A conspiracy of random X and model violation against classical inference in linear regression. Working Paper.
- Buonaccorsi JP. 2010. Measurement Error: Models, Methods and Applications. Chapman & Hall/CRC: Boca Raton, FL.
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. 2006. Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Chapman and Hall/CRC: Boca Raton, FL.
10.1201/9781420010138 Google Scholar
- Cohen MA, Adar SD, Allen RW, Avol E, Curl CL, Gould T, Hardie D, Ho A, Kinney P, Larson TV, Sampson PD, Sheppard L, Stukovsky KD, Swan SS, Liu L-JS, Kaufman JD. 2009. Approach to estimating participant pollutant exposures in the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Environmental Science and Technology 43: 4687–4693.
- Cressie NAC. 1993. Statistics for Spatial Data. John Wiley and Sons: New York.
10.1002/9781119115151 Google Scholar
- Dement JM, Harris RL, Jr., Symons MJ, Shy CM. 1983. Exposures and mortality among chrysotile asbestos workers. Part I: exposure estimates. American Journal of Industrial Medicine 4(3): 399–419.
- Dockery DW, Pope CA, Xu X, Spangler JD, Ware JH, Fay ME, Ferris BG, Speizer FE. 1993. An association between air pollution and mortality in six cities. New England Journal of Medicine 329(24): 1753–1759.
- Eckel SP, Berhane K, Salam MT, Rappaport EB, Linn WS, Bastain TM, Zhang Y, Lurmann F, Avol EL, Gilliland FD. 2011. Residential traffic-related pollution exposures and exhaled nitric oxide in the Children's Health Study. Environmental Health Perspectives 119(10): 1472–1477.
- Efron B, Tibshirani R, Tibshirani RJ. 1993. An Introduction to the Bootstrap. Chapman & Hall/CRC: Boca Raton, FL
10.1007/978-1-4899-4541-9 Google Scholar
- Fanshawe TR, Diggle PJ, Rushton S, Sanderson R, Lurz PWW, Glinianaia SV, Pearce MS, Parker L, Charlton M, Pless-Mulloli T. 2008. Modelling spatio-temporal variation in exposure to particulate matter: a two-stage approach. Environmetrics 19(6): 549–566.
- Folland GB. 1999. Real Analysis: Modern Techniques and Their Applications, Vol. 40. Wiley-Interscience: New York, NY.
- Gan WQ, Koehoorn M, Davies HW, Demers PA, Tamburic L, Brauer M. 2011. Long-term exposure to traffic-related air pollution and the risk of coronary heart disease hospitalization and mortality. Environmental Health Perspectives 119(4): 501–507.
- Gelman A. 2005. Analysis of variance: why it is more important than ever. Annals of Statistics 33(1): 1–31.
- Gryparis A, Paciorek CJ, Zeka A, Schwartz J, Coull BA. 2009. Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics 10(2): 258–274.
- Hastie T, Tibshirani R, Friedman J. 2001. Elements of Statistical Learning. Springer: New York.
10.1007/978-0-387-21606-5 Google Scholar
- Hodges J. 2014. Richly Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects. Chapman and Hall: Boca Raton, FL.
- Hodges J, Reich B. 2010. Adding spatially-correlated errors can mess up the fixed effect you love. The American Statistician 64(4): 325–334.
- Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, Briggs D. 2008. A review of land-use regression models to assess spatial variation in outdoor air pollution. Atmospheric Environment 42: 7561–7578.
- Hojsgaard S, Halekoh U, Yan J. 2006. The R package geepack for generalized estimating equations. Journal of Statistical Software 15: 1–11.
- Jerrett M, Arain A, Kanaroglou P, Beckerman B, Potoglou D, Sahsuvaroglu T, Morrison J, Giovis C. 2005a. A review and evaluation of intraurban air pollution exposure models. Journal of Exposure Analysis and Environmental Epidemiology 15: 185–204.
- Jerrett M, Burnett RT, Ma R, Pope CA, Krewski D, Newbold KB, Thurston G, Shi Y, Finkelstein N, Calle EE, Thun MJ. 2005b. Spatial analysis of air pollution mortality in Los Angeles. Epidemiology 16(6): 727–736.
- Kaufman JD, Adar SD, Allen R, Barr RG, Budoff M, Burke G, Casillas A, Cohen M, Curl C, Daviglus M, Diez-Roux A, Jacobs D, Kronmal R, Larson R, Liu L-J, Lumley T, Navas-Acien A, O'Leary D, Rotter J, Sampson PD, Sheppard L, Siscovick D, Sten J, Szpiro AA. 2012. Prospective study of particulate air pollution exposures, subclinical atherosclerosis, and clinical cardiovascular disease. The Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). American Journal of Epidemiology, 176(9): 825–837.
- Kim SY, Sheppard L, Kim H. 2009. Health effects of long-term air pollution: influence of exposure prediction methods. Epidemiology 20(3): 442–450.
- Künzli N, Jerrett M, Mack WJ, Beckerman B, LaBree L, Gilliland F, Thomas D, Peters J, Hodis HN. 2005. Ambient air pollution and atherosclerosis in Los Angeles. Environmental Health Perspectives 113(2): 201–206.
- Lopiano KK, Young LJ, Gotway CA. 2011. A comparison of errors in variables methods for use in regression models with spatially misaligned data. Statistical Methods in Medical Research 20(1): 29–47.
- Lumley T. 2010. Complex Surveys: A Guide to Analysis Using R, Vol. 565. Wiley: Hoboken, NJ.
10.1002/9780470580066 Google Scholar
- Lunn D, Best N, Spiegelhalter D, Graham G, Neuenschwander B. 2009. Combining MCMC with sequential PKPD modelling. Journal of Pharmacokinetics and Pharmacodynamics 36(1): 19–38.
- Madsen L, Ruppert D, Altman NS. 2008. Regression with spatially misaligned data. Environmetrics 19: 453–467.
- Miller KA, Sicovick DS, Sheppard L, Shepherd K, Sullivan JH, Anderson GL, Kaufman JD. 2007. Long-term exposure to air pollution and incidence of cardiovascular events in women. New England Journal of Medicine 356(5): 447–458.
- Paciorek CJ. 2007. Bayesian smoothing with Gaussian processes using Fourier basis functions in the spectralGP library. Journal of Statistical Software 19(2): 1–38.
- Peters A, Pope CA. 2002. Cardiopulmonary mortality and air pollution. The Lancet 360(9341): 1184–1185.
- Pope CA, Burnett RT, Thun MJ, Calle EE, Ito K, Krewski D, Thurston GD. 2002. Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. Journal of the American Medical Association 9(287): 1132–1141.
- Pope CA, Young B, Dockery DW. 2006. Health effects of fine particulate air pollution: lines that connect. Journal of the Air & Waste Management Association 56(6): 709–742.
- Preller L, Kromhout H, Heederik D, Tielen MJM. 1995. Modeling long-term average exposure in occupational exposure-response analysis. Scandinavian Journal of Work, Environment & Health 21(6): 504.
- Prentice RL. 2010. Chronic disease prevention research methods and their reliability, with illustrations from the womens health initiative. Journal of the American Statistical Association 105(492): 1431–1443.
- Puett RC, Hart JE, Yanosky JD, Paciorek CJ, Schwartz J, Suh HH, Speizer FE, Laden F. 2009. Chronic fine and coarse particulate exposure, mortality, and coronary heart disease in the Nurses’ Health Study. Environmental Health Perspectives 117: 1697–1701, DOI:10.1289/ehp.0900572.
- Raaschou-Nielsen O, Andersen ZJ, Beelen R, Samoli E, Stafoggia M, Weinmayr G, Hoffmann B, Fischer P, Nieuwenhuijsen MJ, Brunekreef B, Xun WW, Katsouyanni K, Dimakopoulou K, Sommar J, Forsberg B, Modig L, Oudin A, Oftedal B, Schwarze PE, Nafstad P, De Faire U, Pedersen NL, Östenson C-G, Fratiglioni L, Penell J, Korek M, Pershagen G, Eriksen KT, Sørensen M, Tjønneland A, Ellermann T, Eeftens M, Peeters PH, Meliefste K, Wang M, Bueno-de-Mesquita B, Key TJ, de Hoogh K, Concin H, Nagel G, Vilier A, Grioni S, Krogh V, Tsai M-Y, Ricceri F, Sacerdote C, Galassi C, Migliore E, Ranzi A, Cesaroni G, Badaloni C, Forastiere F, Tamayo I, Amiano P, Dorronsoro M, Trichopoulou A, Bamia C, Vineis P, Hoek G. 2013. Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). The Lancet Oncology 14: 813–822.
- Ruppert D, Wand MP, Carroll RJ. 2003. Semiparametric Regression, Vol. 12. Cambridge University Press: New York, NY.
10.1017/CBO9780511755453 Google Scholar
- Ryan PH, LeMasters GK, Biswas P, Levin L, Hu S, Lindsey M, Bernstein DI, Lockey J, Villareal M, Hershey GKK, Grinshpun SA. 2007. A comparison of proximity and land use regression traffic exposure models and wheezing in infants. Environmental Health Perspectives 115(2): 278.
- Shao J. 2010. Mathematical Statistics, 2nd edn. Springer: Berlin.
- Sheppard L, Burnett RT, Szpiro AA, Kim SY, Jerrett M, Pope CA, Brunekreef B. 2012. Confounding and exposure measurement error in air pollution epidemiology. Air Quality, Atmosphere & Health 5(2): 203–216.
- Sinha S, Mallick BK, Kipnis V, Carroll RJ. 2010. Semiparametric Bayesian analysis of nutritional epidemiology data in the presence of measurement error. Biometrics 66(2): 444–454.
- Slama R, Morgenstern V, Cyrys J, Zutavern A, Herbarth O, Wichmann HE, Heinrich J, LISA Study Group. 2007. Traffic-related atmospheric pollutants levels during pregnancy and offspring's term birth weight: a study relying on a land-use regression exposure model. Environmental Health Perspectives 115(9): 1283.
- Spiegelman D. 2010. Approaches to uncertainty in exposure assessment in environmental epidemiology. Annual Review of Public Health 31: 149–163.
- Stram DO, Langholz B, Huberman M, Thomas DC. 1999. Correcting for exposure measurement error in a reanalysis of lung cancer mortality for the Colorado Plateau Uranium Miners cohort. Health Physics 77(3): 265.
- Su JG, Jerrett M, Beckerman B. 2009. A distance-decay variable selection strategy for land use regression modeling of ambient air pollution exposures. Science of the Total Environment 407: 3890–3898.
- Szpiro AA, Rice KM, Lumley T. 2010a. Model-robust regression and a Bayesian sandwich estimator. The Annals of Applied Statistics 4(4): 2099–2113.
- Szpiro AA, Sampson PD, Sheppard L, Lumley T, Adar SD, Kaufman JD. 2010b. Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics 21: 606–631, DOI: 10.1002/env.1014.
- Szpiro AA, Paciorek CJ, Sheppard L. 2011a. Does more accurate exposure prediction necessarily improve health effect estimates? Epidemiology 22(5): 680–685.
- Szpiro AA, Sheppard L, Lumley T. 2011b. Efficient measurement error correction with spatially misaligned data. Biostatistics 12(4): 610–623.
- van der Vaart AW. 1998. Asymptotic Statistics. University of Cambridge Press: New York, NY.
10.1017/CBO9780511802256 Google Scholar
- Van Hee VC, Adar SD, Szpiro AA, Barr RG, Bluemke DA, Roux AVD, Gill EA, Sheppard L, Kaufman JD. 2009. Exposure to traffic and left ventricular mass and function. American Journal of Respiratory and Critical Care Medicine 179(9): 827–834.
- Van Hee VC, Szpiro AA, Prineas R, Neyer J, Watson K, Siscovick D, Kyun Park S, Kaufman JD. 2011. Association of long-term air pollution with ventricular conduction and repolarization abnormalities. Epidemiology 22(6): 773.
- Van Hee VC, Oron AP, Bluemke DA, Szpiro AA, Diez-Roux AV, Siscovick D, Kaufman JD. 2012. Long-term exposure to oxides of nitrogen and left ventricular mass in the multi-ethnic study of atherosclerosis and air pollution. Circulation 125: A059.
- Wakefield J, Shaddick G. 2006. Health-exposure modeling and the ecological fallacy. Biostatistics 7(3): 438–455.
- White H. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4): 817–838.
- Wilton D, Szpiro A, Gould T, Larson T. 2010. Improving spatial concentration estimates for nitrogen oxides using a hybrid meteorological dispersion/land use regression model in Los Angeles, CA and Seattle, WA. Science of the Total Environment 408(5): 1120–1130.
- Wood SN. 2006. Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC: Boca Raton, FL.
10.1201/9781420010404 Google Scholar
- Yanosky JD, Paciorek CJ, Suh H. 2009. Predicting chronic fine and coarse particulate exposure using spatio-temporal models for the northeastern and midwestern US. Environmental Health Perspectives 117: 522–529.