Skip to main content
Intended for healthcare professionals
Restricted access
Research article
First published September 2005

Procedures for the Analysis of Differential Item Functioning (DIF) for Small Sample Sizes

Abstract

An item with differential item functioning (DIF) displays different statistical properties, conditional on a matching variable. The presence of DIF in measures can invalidate the conclusions of medical outcome studies. Numerous approaches have been developed to examine DIF in many areas, including education and health-related quality of life. There is little consensus in the research community regarding selection of one best method, and most methods require large sample sizes. This article describes some approaches to examine DIF with small samples (e.g., less than 200).

Get full access to this article

View all access and purchase options for this article.

References

Anastasi, A., & Urbina, S. (1997). Psychological testing. Upper Saddle River, NJ: Prentice Hall.
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95-105.
Bergstrom, B. A., Gershon, R. C., & Brown, W. L. (1993, April). Differential item functioning vs. differential test functioning. Paper presented at the annual meeting of the American Educational Research Association, Atlanta. (ERIC Document Reproduction No. TMO22402)
Bergstrom, B. A., Gershon, R. C., & Brown, W. L. (1994, April). The effect of ability on differential item functioning. Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Chang, C.-H. & Reeve, B. B. (2005). Item Response Theory and Its applications to patient-reported outcomes measurement. Evaluation & the Health Professions, 28(3), 264-282.
Cole, N. S., & Moss, P. A. (1989). Bias in test use. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 201-219). New York: American Council on Education.
Crane, P. K., van Belle, G., & Larson, E. G. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23, 241-256.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Fidalgo, A. M., Ferreres, D., & Muniz, J. (2004). Utility of the Mantel-Haenszel procedure for detecting differential item functioning in small samples. Educational and Psychological Measurement, 64, 925-936.
Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in SF-12 scores: True differences or differential item functioning. Medical Care, 41, III-75-III-86.
Flower, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous DFIT framework. Applied Psychology Measurement, 23, 309-326.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 147-200). New York: American Council on Education.
Hidalgo, M. D., & Lopez-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 6, 903-915.
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedures. In H. Wainer& H. Brain (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
Lai, J.-S., Cella, D., Chang, C.-H., Bode, R., & Heinemann, A. (2003). Item banking to improve, shorten and computerize self-reported fatigue. Quality of Life Research, 12, 485-501.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational & Psychological Measurement, 52(2), 443-451.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118.
Osterlind, S. J. (1983). Test item bias. Newbury Park, CA: Sage.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207.
Raju, N. S. (1991). Determining the significance of estimated signed and unsigned areas between two item response functions: Correction. Applied Psychological Measurement, 15(4), 352.
Raju, N. S. (2000). DFITPS6: A computer program for analyzing differential item functioning [Computer software]. Chicago: Illinois Institute of Technology.
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517-529.
Raju, N. S., van der Linden, W., & Fleer, P. (1995). An IRT-based internal measure of test bias with applications for differential item functioning. Applied Psychological Measurement, 19, 353-368.
Scheuneman, J. D. (1979). A new method of assessing bias in test items. Journal of Educational Measurement, 16, 143-152.
Shtatland, E. S., Moore, S., & Barton, M. B. (2000, April 9-12). Why we need an R2 measure of fit (and not only one) in PROC LOGISTIC and PROC GENMOD. Proceedings of the 25th annual SAS users group international conference, Indianapolis, Indiana.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
Teresi, J. A. (2001). Statistical methods for examination of Differential Item Functioning (DIF) with applications to cross-cultural measurement of functional, physical, and mental health. Journal of Mental Health and Aging, 7, 31-40.
Teresi, J. A. (2004). Differential item functioning and health assessment. Retrieved December, 2004, from http://outcomes.cancer.gov/conference/irt/teresi.pdf
Teresi, J. A., & Holmes, D. (1994). Overview of methodological issues in gerontological and geriatric measurement. In M. Lawton & J. Teresi (Eds.), Annual review of gerontology and geriatrics: Focus on assessment techniques (Vol. 14, pp. 1-22). New York: Springer.
Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Statistics in Medicine, 19, 1651-1683.
Thissen, D. (2001). IRTLRDIF v2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Retrieved December, 2004, from http://www.unc.edu/%7Edthissen/
Wright, B. D., & Panchapakesan, N. (1969). A procedure for sample-free item analysis (MESA Research Memoranda No. 46). Chicago: MESA Press.
Wright, B. D., & Stone, M. (1979). Best test design. Chicago: MESA Press.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Ontario, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published: September 2005
Issue published: September 2005

Keywords

  1. differential item functioning
  2. DIF
  3. Rasch analysis
  4. Item Response Theory
  5. logistic regression
  6. health-related quality of life

Rights and permissions

Request permissions for this article.
PubMed: 16123258

Authors

Affiliations

Jin-Shei Lai
Feinberg School of Medicine at Northwestern University, Center on Outcomes, Research and Education (CORE), Evanston Northwestern Healthcare
Jeanne Teresi
Columbia University Stroud Center and Faculty of Medicine, New York State Psychiatric Institute, Hebrew Home for the Aged at Riverdale
Richard Gershon
Northwestern University, Center on Outcomes, Research and Education (CORE), Evanston Northwestern Healthcare

Metrics and citations

Metrics

Journals metrics

This article was published in Evaluation & the Health Professions.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 557

*Article usage tracking started in December 2016


Articles citing this one

Receive email alerts when this article is cited

Web of Science: 58 view articles Opens in new tab

Crossref: 60

  1. The Communicative Participation Item Bank–Gender-Diverse Version: Item...
    Go to citation Crossref Google Scholar
  2. Analysis of Differential item functioning according to digital device ...
    Go to citation Crossref Google Scholar
  3. Differential item functioning for the Tendency of Avoiding Physical Ac...
    Go to citation Crossref Google Scholar
  4. Gender-Based Differential Item Function for the Positive and Negative ...
    Go to citation Crossref Google Scholar
  5. Maslach Burnout Inventory-General Survey’s abbreviated measurement: va...
    Go to citation Crossref Google Scholar
  6. Development of an item bank for measuring prosthetic mobility in peopl...
    Go to citation Crossref Google Scholar
  7. Internal Structure of the Work–Family Conflict Questionnaire (WFCQ) in...
    Go to citation Crossref Google Scholar
  8. An Intersectional Approach to DIF: Comparing Outcomes across Methods
    Go to citation Crossref Google Scholar
  9. Assessing the validity of an IAU General English Achievement Test thro...
    Go to citation Crossref Google Scholar
  10. Using subdomain-specific item sets affected PROMIS physical function s...
    Go to citation Crossref Google Scholar
  11. Testing Differential Item Functioning in Small Samples
    Go to citation Crossref Google Scholar
  12. Development of Rasch-based short screenings for the assessment of trea...
    Go to citation Crossref Google Scholar
  13. A Comparative Study of the Bias Correction Methods for Differential It...
    Go to citation Crossref Google Scholar
  14. Measurement Invariance of the Prosocial Behavior Scale in Three Hispan...
    Go to citation Crossref Google Scholar
  15. An Integrated Approach to Bias in a Longitudinal Survey in the United ...
    Go to citation Crossref Google ScholarPub Med
  16. Effect of Sample Size Ratio and Model Misfit When Using the Difficulty...
    Go to citation Crossref Google Scholar
  17. The weight-specific adolescent instrument for economic evaluation (WAI...
    Go to citation Crossref Google Scholar
  18. Bender‐Gestalt II differential item functioning across Caucasian and A...
    Go to citation Crossref Google Scholar
  19. Is the Hospital Anxiety and Depression Scale (HADS) a valid measure in...
    Go to citation Crossref Google Scholar
  20. Item response theory analysis of the Lichtenberg Financial Decision Sc...
    Go to citation Crossref Google Scholar
  21. Validating the Communicative Participation Item Bank (CPIB) for use wi...
    Go to citation Crossref Google Scholar
  22. Learning in Digital Networks – ICT literacy: A novel assessment of stu...
    Go to citation Crossref Google Scholar
  23. An Examination of the Instructional Sensitivity of the TIMSS Math Item...
    Go to citation Crossref Google Scholar
  24. Development and Validation of a Genomic Knowledge Scale to Advance Inf...
    Go to citation Crossref Google ScholarPub Med
  25. Development of the Maree Career Matrix: a new interest inventory
    Go to citation Crossref Google Scholar
  26. Item Response Theory Applied to Factors Affecting the Patient Journey ...
    Go to citation Crossref Google Scholar
  27. Assessing the EORTC QLQ-BM22 Module Using Rasch Modeling and Confirmat...
    Go to citation Crossref Google Scholar
  28. Decreased Self-Concept Clarity in People with Schizophrenia
    Go to citation Crossref Google Scholar
  29. An Investigation of Measurement Equivalence in Hearing Response Scales...
    Go to citation Crossref Google Scholar
  30. Psychometric evaluation of the Basic Traits Inventory in the multiling...
    Go to citation Crossref Google Scholar
  31. Parent-reported cognition of children with cancer and its potential cl...
    Go to citation Crossref Google Scholar
  32. Rasch analysis of the General Self-Efficacy Scale in spinal cord injur...
    Go to citation Crossref Google ScholarPub Med
  33. Few items in the thyroid-related quality of life instrument ThyPRO exh...
    Go to citation Crossref Google Scholar
  34. A rasch model to test the cross-cultural validity in the positive and ...
    Go to citation Crossref Google Scholar
  35. Effectiveness of Combining Statistical Tests and Effect Sizes When Usi...
    Go to citation Crossref Google Scholar
  36. Measurement Equivalence of Four Psychological Questionnaires in Native...
    Go to citation Crossref Google ScholarPub Med
  37. Rasch model of the GAIN Substance Problem Scale among Canadian adults ...
    Go to citation Crossref Google Scholar
  38. Detecting native language group differences at the subskills level of ...
    Go to citation Crossref Google Scholar
  39. Translation and validation of the Dutch version of the Effective Consu...
    Go to citation Crossref Google Scholar
  40. Parent Ratings of ADHD Symptoms...
    Go to citation Crossref Google ScholarPub Med
  41. Evaluating the MBTI® Form M in a South African context
    Go to citation Crossref Google Scholar
  42. Differential Item Functioning: Implications for Test Validation
    Go to citation Crossref Google Scholar
  43. Examining Multiple Sources of Differential Item Functioning on the ...
    Go to citation Crossref Google Scholar
  44. Development of a Parent-Report Cognitive Function Item Bank Using Item...
    Go to citation Crossref Google Scholar
  45. Disease-related differential item functioning in the work instability ...
    Go to citation Crossref Google Scholar
  46. Comparisons of methamphetamine psychotic and schizophrenic symptoms: A...
    Go to citation Crossref Google Scholar
  47. The Assessment of Physiotherapy Practice (APP) is a valid measure of p...
    Go to citation Crossref Google Scholar
  48. Évaluation d’un test de lecture en anglais par deux méthodes de détect...
    Go to citation Crossref Google Scholar
  49. Differential item functioning (DIF) analyses of health-related quality...
    Go to citation Crossref Google Scholar
  50. Differential Item Functioning
    Go to citation Crossref Google Scholar
  51. Classical test theory and item response theory/Rasch model to assess d...
    Go to citation Crossref Google Scholar
  52. Response shift: a brief overview and proposed research priorities
    Go to citation Crossref Google Scholar
  53. A Rasch Model Analysis of Alcohol Consumption and Problems Across Adol...
    Go to citation Crossref Google Scholar
  54. A simulation study provided sample size guidance for differential item...
    Go to citation Crossref Google Scholar
  55. Japanese–English language equivalence of the Cognitive Abilities Scree...
    Go to citation Crossref Google Scholar
  56. Rasch analysis of the Dutch health assessment questionnaire disability...
    Go to citation Crossref Google Scholar
  57. An item‐level analysis of the Center for Epidemiologic Studies Depress...
    Go to citation Crossref Google Scholar
  58. Longitudinal Stability of the Fugl-Meyer Assessment of the Upper Extre...
    Go to citation Crossref Google Scholar
  59. Cross‐cultural adaptation of the Systemic Lupus Erythematosus Quality ...
    Go to citation Crossref Google Scholar
  60. Measuring Fatigue for Children With Cancer: Development and Validation...
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:

AEA and AES members can access this journal content using society membership credentials.


AEA and AES members can access this journal content using society membership credentials.



Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.

View options

PDF/ePub

View PDF/ePub