Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples

Peter C. Austin,

Corresponding Author

Peter C. Austin

[email protected]

Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario, Canada

Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada

Department of Health Policy, Management and Evaluation, University of Toronto, Canada

Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario, Canada M4N 3M5Search for more papers by this author

Peter C. Austin,

Corresponding Author

Peter C. Austin

[email protected]

Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario, Canada

Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada

Department of Health Policy, Management and Evaluation, University of Toronto, Canada

Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario, Canada M4N 3M5Search for more papers by this author

First published: 13 October 2009

https://doi.org/10.1002/sim.3697

Citations: 3,803

About

PDF

Tools

Share a link

Email
Facebook
Twitter
LinkedIn
Reddit
Wechat

Abstract

The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity-score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity-score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five-number summaries; and graphical methods such as quantile–quantile plots, side-by-side boxplots, and non-parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity-score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.

REFERENCES

1 Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55.
10.1093/biomet/70.1.41
Web of Science®Google Scholar
2 Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association 1984; 79: 516–524.
10.1080/01621459.1984.10478078
Web of Science®Google Scholar
3 Austin PC, Mamdani MM. A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Statistics in Medicine 2006; 25: 2084–2106.
10.1002/sim.2328
PubMedWeb of Science®Google Scholar
4 Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Principles for modeling propensity scores in medical research: a systematic literature review. Pharmacoepidemiology and Drug Safety 2004; 13: 841–853.
10.1002/pds.969
CASPubMedWeb of Science®Google Scholar
5 Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods give similar results to traditional regression modeling in observational studies: a systematic review. Journal of Clinical Epidemiology 2005; 58: 550–559.
10.1016/j.jclinepi.2004.10.016
PubMedWeb of Science®Google Scholar
6 Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. Journal of Clinical Epidemiology 2006; 59: 437–447.
10.1016/j.jclinepi.2005.07.004
PubMedWeb of Science®Google Scholar
7 Austin PC. A critical appraisal of propensity score matching in the medical literature from 1996 to 2003. Statistics in Medicine 2008; 27: 2037–2049.
10.1002/sim.3150
PubMedWeb of Science®Google Scholar
8 Austin PC. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. Journal of Thoracic and Cardiovascular Surgery 2007; 134: 1128–1135.
10.1016/j.jtcvs.2007.07.021
PubMedWeb of Science®Google Scholar
9 Austin PC. A report card on propensity-score matching in the cardiology literature from 2004 to 2006: results of a systematic review. Circulation: Cardiovascular Quality and Outcomes 2008; 1: 62–67.
10.1161/CIRCOUTCOMES.108.790634
PubMedWeb of Science®Google Scholar
10 Rubin DB. Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology 2001; 2: 169–188.
10.1023/A:1020363010465
Google Scholar
11 Rubin DB. On principles for modeling propensity scores in medical research. Pharmacoepidemiology and Drug Safety 2004; 13: 855–857.
10.1002/pds.968
PubMedWeb of Science®Google Scholar
12 Ho DE, Imai K, King G, Stuart EA. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 2007; 15: 199–236.
10.1093/pan/mpl013
PubMedWeb of Science®Google Scholar
13 Austin PC, Mamdani MM, Juurlink DN, Alter DA, Tu JV. Missed opportunities in the secondary prevention of myocardial infarction: an assessment of the effects of statin underprescribing on mortality. American Heart Journal 2006; 151: 969–975.
10.1016/j.ahj.2005.06.034
PubMedWeb of Science®Google Scholar
14 Austin PC, Tu JV. Comparing clinical data with administrative data for producing AMI report cards. Journal of the Royal Statistical Society—Series A (Statistics in Society) 2006; 169: 115–126.
10.1111/j.1467-985X.2005.00380.x
Web of Science®Google Scholar
15 Austin PC. A comparison of classification and regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Statistics in Medicine 2007; 26: 2937–2957.
10.1002/sim.2770
PubMedWeb of Science®Google Scholar
16 Tu JV, Donovan LR, Lee DS, Austin PC, Ko DT, Wang JT, Newman AM. Quality of Cardiac Care in Ontario. Institute for Clinical Evaluative Sciences: Toronto, Ontario, 2004.

Google Scholar
17 Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Statistics in Medicine 2007; 26: 734–753.
10.1002/sim.2580
PubMedWeb of Science®Google Scholar
18 Austin PC. Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations. Biometrical Journal 2009; 51: 171–184.
10.1002/bimj.200810488
PubMedWeb of Science®Google Scholar
19 Moher D, Schulz KF, Altman D for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association 2001; 285: 1787–1991.

Web of Science®Google Scholar
20 Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gotzsche PC, Lang T for the CONSORT Group. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Annals of Internal Medicine 2001; 134: 663–694.
10.7326/0003-4819-134-8-200104170-00012
CASPubMedWeb of Science®Google Scholar
21 Flury BK, Riedwyl H. Standard distance in univariate and multivariate analysis. The American Statistician 1986; 40: 249–251.
10.2307/2684560
Web of Science®Google Scholar
22 Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician 1985; 39: 33–38.
10.2307/2683903
Web of Science®Google Scholar
23 Normand SLT, Landrum MB, Guadagnoli E, Ayanian JZ, Ryan TJ, Cleary PD, McNeil BJ. Validating recommendations for coronary angiography following an acute myocardial infarction in the elderly: a matched analysis using propensity scores. Journal of Clinical Epidemiology 2001; 54: 387–398.
10.1016/S0895-4356(00)00321-8
CASPubMedWeb of Science®Google Scholar
24 Ahmed A, Perry GJ, Fleg JL, Love TE, Goff Jr DC, Kitzman DW. Outcomes in ambulatory chronic systolic and diastolic heart failure: a propensity score analysis. American Heart Journal 2006; 152: 956–966.
10.1016/j.ahj.2006.06.020
PubMedWeb of Science®Google Scholar
25 Ahmed A, Husain A, Love TE, Gambassi G, Dell'Italia LJ, Francis GS, Gheorghiade M, Allman RM, Meleth S, Bourge RC. Heart failure, chronic diuretic use, and increase in mortality and hospitalization: an observational study using propensity score methods. European Heart Journal 2006; 27: 1431–1439.
10.1093/eurheartj/ehi890
PubMedWeb of Science®Google Scholar
26 Cohen J. Statistical Power Analysis for the Behavioral Sciences ( 2nd edn). Lawrence Erlbaum Associates Publishers: Hillsdale, NJ, 1988.
10.1046/j.1526-4610.2001.111006343.x
Google Scholar
27 Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. Academic Press: San Diego, CA, 1985.

CASGoogle Scholar
28 Austin PC. Type I error rates, coverage of confidence intervals, and variance estimation in propensity-score matched analyses. The International Journal of Biostatistics 2009; 5(1):Article 13.
10.2202/1557-4679.1146
PubMedWeb of Science®Google Scholar
29 Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A (Statistics in Society) 2008; 171: 481–502.
10.1111/j.1467-985X.2007.00527.x
Web of Science®Google Scholar
30 Rosner B. Fundamentals of Biostatistics ( 4th edn). Duxbury Press: Belmont, CA, 1995.

Google Scholar
31 Harrell Jr FE. Regression Modeling Strategies. Springer: New York, NY, 2001.
10.1007/978-1-4757-3462-1
Google Scholar
32 Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. Wiley: New York, NY, 1983.

Web of Science®Google Scholar
33 Casella G, Berger RL. Statistical Inference. Duxbury Press: Belmont, CA, 1990.

Google Scholar
34 Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Weaknesses of goodness-of-fit tests for evaluating propensity score models: the case of the omitted confounder. Pharmacoepidemiology and Drug Safety 2005; 14: 227–238.
10.1002/pds.986
CASPubMedWeb of Science®Google Scholar
35 Senn S. Testing for baseline balance in clinical trials. Statistics in Medicine 1994; 13: 1715–1726.
10.1002/sim.4780131703
CASPubMedWeb of Science®Google Scholar
36 Senn SJ. Covariate imbalance and random allocation in clinical trials. Statistics in Medicine 1989; 8: 467–475.
10.1002/sim.4780080410
CASPubMedWeb of Science®Google Scholar
37 Altman DG, Dore CJ. Baseline comparisons in randomized clinical trials. Statistics in Medicine 1991; 10: 797–802.
10.1002/sim.4780100514
CASPubMedWeb of Science®Google Scholar
38 Lavori PW, Louis TA, Bailar III JC, Polansky M. Designs for experiments—parallel comparisons of treatment. New England Journal of Medicine 1983; 309: 1291–1298.
10.1056/NEJM198311243092105
PubMedWeb of Science®Google Scholar
39 Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika 1984; 7: 431–444.
10.1093/biomet/71.3.431
Web of Science®Google Scholar
40 Austin PC, Zwarenstein M, Manca A, Juurlink DN, Stanbrook MB. Handling of baseline covariates in randomized controlled trials: a review of trials published in leading medical journals. Journal of Clinical Epidemiology, DOI: 10.1016/j.jclinepi.2009.06.002.
10.1016/j.jclinepi.2009.06.002
PubMedGoogle Scholar
41 Sackett DL. Down with odds ratios! for publication. Evidence-Based Medicine 1996; 1: 164–166.

Google Scholar
42 Newcombe RG. A deficiency of the odds ratio as a measure of effect size. Statistics in Medicine 2006; 25: 4235–4240.
10.1002/sim.2683
PubMedWeb of Science®Google Scholar
43 Schechtman E. Odds ratio, relative risk, absolute risk reduction, and the number needed to treat—which of these should we use? Value in Health 2002; 5: 431–436.
10.1046/j.1524-4733.2002.55150.x
CASPubMedWeb of Science®Google Scholar
44 Cook RJ, Sackett DL. The number needed to treat: a clinically useful measure of treatment effect. British Medical Journal 1995; 310: 452–454.
10.1136/bmj.310.6977.452
CASPubMedWeb of Science®Google Scholar
45 Jaeschke R, Guyatt G, Shannon H, Walter S, Cook D, Heddle N. Basis statistics for clinicians 3: assessing the effects of treatment: measures of association. Canadian Medical Association Journal 1995; 152: 351–357.

PubMedWeb of Science®Google Scholar
46 Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47: 881–889.
10.1016/0895-4356(94)90191-0
CASPubMedWeb of Science®Google Scholar
47 Austin PC. Assessing balance in baseline covariates when using many-to-one matching on the propensity-score. Pharmacoepidemiology and Drug Safety 2008; 17: 1218–1225.
10.1002/pds.1674
PubMedWeb of Science®Google Scholar

Citing Literature

Volume28, Issue25

10 November 2009

Pages 3083-3107

Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples

Abstract

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples

Abstract

REFERENCES

Citing Literature

References

Related

Information