Skip to main content
Intended for healthcare professionals
Restricted access
Research article
First published online June 20, 2013

Effectiveness of Combining Statistical Tests and Effect Sizes When Using Logistic Discriminant Function Regression to Detect Differential Item Functioning for Polytomous Items

Abstract

The objective of this article was to find an optimal decision rule for identifying polytomous items with large or moderate amounts of differential functioning. The effectiveness of combining statistical tests with effect size measures was assessed using logistic discriminant function analysis and two effect size measures: R2 and conditional log odds ratio in delta scale (ΔLR). Four independent variables were manipulated: (a) different sample sizes for the reference and focal groups (1,000/500, 1,000/250, 500/250), (b) impact between reference and focal group (equal-ability distribution, i.e., no impact; or different-ability distribution, i.e., impact), (c) the percentage of differential item functioning (DIF) items in a test (0%, 12%, i.e., only the first three items of the test; 20%, i.e., the first five items of the test; 32%, i.e., the first eight items of the test), and (d) direction of DIF (one-sided and both-sided). The magnitudes of DIF were indirectly manipulated through the percentage of DIF items and DIF direction, and they were simulated to be moderate or large. The results show that the false positive rates were low when an effect size decision rule was used in combination with a statistical test, and they were very low when R2 effect size criteria were applied. With respect to power, when a statistical test was used in conjunction with effect size criteria to determine whether an item exhibited a meaningful magnitude of DIF, we found when using the ΔLRdecision rule that the percentage of meaningful DIF items was higher with greater amounts of DIF. Examining DIF by means of blended statistical tests, in other words, those incorporating both the p value and effect size measures, can be recommended as a procedure for classifying items displaying DIF.

Get full access to this article

View all access and purchase options for this article.

References

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.
Bradley J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-152.
Clauser B. E., Mazor K. M. (1998). Using statistical procedures to identify differential item functioning test items. Educational Measurement: Issues and Practice, 17, 31-44.
DeMars C. E. (2008). Polytomous differential item functioning and violations of ordering of the expected latent trait by the raw score. Educational and Psychological Measurement, 68, 379-386.
DeMars C. E. (2011). An analytic comparison of effect sizes for differential item functioning. Applied Measurement in Education, 24, 189-209.
Dorans N. J., Holland P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization measures of differential item functioning. In Holland P. W., Wainer H. (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.
French A. W., Miller T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33, 315-332.
French B. F., Maller S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393.
Gierl M. J., Gotzmann A., Boughton K. A. (2004). Performance of SIBTEST when the percentage of DIF items is large. Applied Measurement in Education, 17, 241-264.
Gómez-Benito J., Hidalgo M. D., Padilla J. L. (2009). Efficacy of measures of the effect size in logistic regression: An application in the detection of DIF. Methodology—European Journal of Research Methods for the Behavioral and Social Sciences, 5, 18-25.
Hambleton R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(11), S182-S188.
Hambleton R. K., Cook L. (1983). Robustness of item response models and effects of test length and sample size on the precision of ability estimates. In Weiss D. J. (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31-49). New York, NY: Academic Press.
Hidalgo M. D., Gómez-Benito J. (2003). Test purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19, 1-11.
Hidalgo M. D., Gómez-Benito J. (2006). Nonuniform DIF detection using discriminant logistic analysis and multinomial logistic regression: A comparison under polytomous items. Quality & Quantity, 40, 805-823.
Hidalgo M. D., Gómez-Benito J. (2009). DISLOG: Discriminant logistic DIF analysis Gauss computer program.
Hidalgo M. D., Gómez-Benito J. (2010). Education measurement: Differential item functioning. In Peterson P., Baker E., McGaw B. (Eds.), International encyclopedia of education (3rd ed., pp. 36-44). Oxford, England: Elsevier Science & Technology Books.
Hidalgo M. D., López J. A. (2004). DIF detection and effect size: A comparison between logistic regression and Mantel-Haenszel variation. Educational and Psychological Measurement, 64, 903-915.
Holland P. W., Thayer D. T. (1988). Differential item performance and Mantel-Haenszel procedure. In Wainer H., Braun H. I. (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Jodoin M. G., Gierl M. J. (2001). Evaluating Type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
Kristjansson E., Aylesworth R., McDowell I., Zumbo B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65, 935-953.
Lai J. S., Teresi J., Gershon R. (2005). Procedures for the analysis of differential item functioning (DIF) for small samples sizes. Evaluation & the Health Professions, 28, 283-294.
Li Z., Zumbo B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score. Psicológica, 30, 343-370.
Miller T. R., Spray J. A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30, 107-122.
Miller T. R., Spray J. A., Wilson A. (1992, July). A comparison of three methods for identifying nonuniform DIF in polytomously scored test items. Paper presented at the Psychometric Society meeting, Columbus, OH.
Monahan P. O., McHorney C. A., Stump T. E., Perkins A. J. (2007). Odds ratio, delta, ETS classification, and standardization measures of DIF magnitude for binary logistic regression. Journal of Educational and Behavioral Statistics, 32, 92-109.
Narayanan P., Swaminathan H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18, 315-328.
Penfield R. D., Camilli G. (2007). Differential item functioning and item bias. In Rao C. R., Sinharay S. (Eds.). Handbook of statistics (Vol. 26, pp. 125-167). New York, NY: Elsevier.
Penfield R. D., Lam T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19, 5-15.
Potenza M. T., Dorans N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.
Rogers H. J., Swaminathan H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
Roussos L. A., Stout W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215-230.
Samejima F. (1970). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 35, 139.
Spray J., Miller T. (1994). Identifying nonuniform DIF in polytomous scored test items (ACT Research Report Series, 94-1). Iowa City: American College Testing.
Su Y., Wang W. (2005). Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350.
Swaminathan H., Rogers H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Tian F. (1999). Detecting differential item functioning in polytomous items (Unpublished doctoral dissertation). Faculty of Education, University of Ottawa, Ottawa, Ontario, Canada.
Wang W.-C., Yeh Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-499.
Zumbo B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Ontario, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo B. D. (2008). Statistical methods for investigating item bias in self-report measures. Retrieved from http://eprints.unifi.it/archive/00001639/
Zumbo B. D., Thomas D. R. (1997). A measure of effect size for a model-based approach for studying DIF (Working Paper). Prince George, British Columbia, Canada: Edgeworth Laboratory for Quantitative Behavioral Science, University of Northern British Columbia.
Zwick R., Ercikan K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26, 55-66.

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: June 20, 2013
Issue published: October 2013

Keywords

  1. differential item functioning
  2. logistic discriminant function analysis
  3. effect size measure
  4. R2
  5. conditional log odds ratio
  6. polytomous items

Rights and permissions

© The Author(s) 2013.
Request permissions for this article.

Authors

Affiliations

Juana Gómez-Benito
University of Barcelona, Barcelona, Spain
Mª Dolores Hidalgo
University of Murcia, Murcia, Spain
Bruno D. Zumbo
University of British Columbia, Vancouver, British Columbia, Canada

Notes

Juana Gómez-Benito, Departament de Metodologia de les Ciències del Comportament, Facultat de Psicologia, Universitat de Barcelona, Passeig de la Vall d’Hebron 171, 08035 Barcelona, Spain. Email: [email protected]

Metrics and citations

Metrics

Journals metrics

This article was published in Educational and Psychological Measurement.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 321

*Article usage tracking started in December 2016


Altmetric

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores



Articles citing this one

Receive email alerts when this article is cited

Web of Science: 38 view articles Opens in new tab

Crossref: 0

  1. Psychometric Validation of the Compulsive Internet Use Scale in Spanis...
    Go to citation Crossref Google ScholarPub Med
  2. General versus domain‐specific grit in the...
    Go to citation Crossref Google Scholar
  3. Problematic Social Network Use: Structure and Assessment
    Go to citation Crossref Google Scholar
  4. Fairness and Comparability in Achievement Motivation Items: A Differen...
    Go to citation Crossref Google Scholar
  5. Grit Assessment: Is One Dimension Enough?
    Go to citation Crossref Google Scholar
  6. Differential Item Functioning Analyses of the Patient-Reported Outcome...
    Go to citation Crossref Google Scholar
  7. Development of a computerized adaptive test to assess entrepreneurial ...
    Go to citation Crossref Google Scholar
  8. A Log-Linear Modeling Approach for Differential Item Functioning Detec...
    Go to citation Crossref Google Scholar
  9. Impact of the intellectual disability severity in the Sp...
    Go to citation Crossref Google Scholar
  10. Effect of Sample Size Ratio and Model Misfit When Using the Difficulty...
    Go to citation Crossref Google Scholar
  11. Gender Differences in Social Inclusion of Youth with Autism and Intell...
    Go to citation Crossref Google Scholar
  12. Examining Validity Evidence of Self-Report Measures Using Differe...
    Go to citation Crossref Google Scholar
  13. Effect-Size Reporting in Mexican Psychology Journals: What it Says abo...
    Go to citation Crossref Google Scholar
  14. Effect Size Measures for Differential Item Functioning in a Multidimen...
    Go to citation Crossref Google Scholar
  15. The Conner’s Adult ADHD Rating Scales-Long Self-Report and Observer Fo...
    Go to citation Crossref Google Scholar
  16. Spanish validation of the Person-centered Care Assessment Tool (P-CAT)
    Go to citation Crossref Google Scholar
  17. The Person Centered approach in Gerontology: New validity evidence of ...
    Go to citation Crossref Google Scholar
  18. A proposal for categorizing the severity of non uniform differential i...
    Go to citation Crossref Google Scholar
  19. Psychometric properties of a Spanish-version of the Schizophrenia Obje...
    Go to citation Crossref Google Scholar
  20. An Anthropologist Among the Psychometricians: Assessment Events, Ethno...
    Go to citation Crossref Google Scholar
  21. The Effect of Motive-Trait Interaction on Satisfaction of the Implicit...
    Go to citation Crossref Google Scholar
  22. Assessing Perceived Emotional Intelligence in Adolescents...
    Go to citation Crossref Google Scholar
  23. Screening Enterprising Personality in Youth: An Empirical Model
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:


Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.

View options

PDF/ePub

View PDF/ePub

Full Text

View Full Text