Skip to main content
Intended for healthcare professionals
Restricted access
Research article
First published online April 28, 2016

How many words do children know? A corpus-based estimation of children’s total vocabulary size

Abstract

In this article we present a new method for estimating children’s total vocabulary size based on a language corpus in German. We drew a virtual sample of different lexicon sizes from a corpus and let the virtual sample “take” a vocabulary test by comparing whether the items were included in the virtual lexicons or not. This enabled us to identify the relation between test performance and total lexicon size. We then applied this relation to the test results of a real sample of children (grades 1–8, aged 6 to 14) and young adults (aged 18 to 25) and estimated their total vocabulary sizes. Average absolute vocabulary sizes ranged from 5900 lemmas in first grade to 73,000 for adults, with significant increases between adjacent grade levels except from first to second grade. Our analyses also allowed us to observe parts of speech and morphological development. Results thus shed light on the course of vocabulary development during primary school.

Get full access to this article

View all access and purchase options for this article.

References

Aitchison J. (2012). Words in the mind: An introduction to the mental lexicon (4th ed.). Malden, MA: Wiley.
Anderson R. C., Freebody P. (1981). Vocabulary knowledge. In Guthrie J. T. (Ed.), Comprehension and teaching. Newark, DE: International Reading Association.
Anderson R. C., Freebody P. (1983). Reading comprehension and the assessment and acquisition of word knowledge. Advances in Reading/Language Research, 2, 231–256.
Anglin J. M. (1993). Vocabulary development: A morphological analysis. Monographs of the Society for Research in Child Development, 58. Hoboken, NJ: Wiley.
Bates E., Goodman J. C. (1999). On the emergence of grammar from the lexicon. In MacWhinney B. (Ed.), The emergence of language. Mahwah, NJ: Lawrence Erlbaum.
Bock R. D., Zimowski M. F. (1997). Multiple group IRT. In van der Linden W. J., Hambleton R. K. (Eds.), Handbook of modern item response theory (pp. 433–448). New York: Springer.
Braze D., Tabour W., Schankweiler D. P., Mencl W. E. (2007). Speaking up for vocabulary: Reading skill differences in young adults. Journal of Learning Disabilities, 40(3), 226–243.
Clark E. V. (1993). The lexicon in acquisition. Cambridge: Cambridge University Press.
Clark R., Hutcheson S., van Buren P. (1974). Comprehension and production in language acquisition. Journal of Linguistics, 10(1), 39–54.
Cunningham A. E., Stanovich K. E. (1991). Tracking the unique effects of print exposure in children: Associations with vocabulary, general knowledge, and spelling. Journal of Educational Psychology, 83(2), 264–274.
D’Anna C., Zechmeister E., Hall J. (1991). Toward a meaningful definition of vocabulary size. Journal of Reading Behavior, 23(1), 109–122.
Dunn D. M., Dunn L. M. (2007). Peabody picture vocabulary test. London: Pearson.
Embretson S. E., Reise S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Fengxiang Fan. (2010). An asymptotic model for the English hapax/vocabulary ratio. Computational Linguistics, 36(4), 631–637.
Fenson L., Dale P. S., Reznick J. S., Bates E., Thal D., Pethik S. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, 59.
Fenson L., Dale P. S., Reznick J. S., Thal D., Bates E., Hartung J. P., Pethick S., Reilly J. S. (1993). The MacArthur communicative development inventories: User’s guide and technical manual. Baltimore, MD: Paul H. Brokes.
Fleischer W., Barz I., Schröder M. (2012). Wortbildung der deutschen Gegenwartssprache (4. Aufl.). De Gruyter Studium. Berlin: De Gruyter.
Gathercole S., Baddeley A. D. (1989). Evaluation of the role of phonological STM in the development of vocabulary in children: A longitudinal study. Journal of Memory and Language, 28, 200–13.
Goodman J. C., Dale P. S., Li P. (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language, 35(3), 515–531.
Goulden R., Nation P., Read J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11(4), 341–363.
Grimm H., Doil H. (2005). Elternfragebögen für die Früherkennung von Risikokindern (ELFRA) (2nd ed.). Göttingen: Hogrefe.
Hayes D. P., Ahrens M. G. (1988). Vocabulary simplification for children: A special case of ‘motherese’? Journal of Child Language, 15(02), 395.
Hoff E. (2014). Language development (5th ed., international ed.). Belmont, CA: Wadsworth Cengage Learning.
Huttenlocher J., Haight W., Bryk A., Seltzer M., Lyons T. (1991). Early vocabulary growth: Relation to language input and gender. Developmental Psychology, 27(2), 236–248.
Jenkins J. R., Stein M. L., Wysocki K. (1984). Learning vocabulary through reading. American Educational Research Journal, 21(4), 767–787.
Jurish B., Würzner K.-M. (2013): Word and sentence tokenization with hidden Markov models. JLCL, 28(2), 61–83.
Kauschke C., Hofmeister C. (2002). Early lexical development in German. A study on vocabulary growth and vocabulary composition during the second and third year of life. Journal of Child Language, 29(4), 735–757.
Kornai A. (2002). How many words are there? Glottometrics, 4, 61–86.
Landi N. (2010). An examination of the relationship between reading comprehension, higher-level and lower-level reading sub-skills in adults. Reading and Writing, 23(6), 701–717.
Lemhöfer K., Broersma M. (2012). Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English. Behavior Research Methods, 44(2), 325–343.
Lorge I., Chall J. (1963). Estimating the size of vocabularies of children and adults: An analysis of methodological issues. The Journal of Experimental Education, 32(2), 147–157.
Mochida A., Harrington M. (2006). The Yes/No test as a measure of receptive vocabulary knowledge. Language Testing, 23(1), 73–98.
Muter V., Hulme C., Snowling M. J., Stevenson J. (2004). Phonemes, rimes, vocabulary, and grammatical skills as foundations of early reading development: Evidence from a longitudinal study. Developmental Psychology, 40(5), 665–681.
Nagy W. E., Herman P. A., Anderson R. C. (1985). Learning words from context. Reading Research Quarterly, 20(2), 233–253.
Naigles L. R., Hoff-Ginsberg E. (1998). Why are some verbs learned before other verbs?: Effect of input frequency and structure on children’s early verb use. Journal of Child Language, 25, 95–120.
Nation I. S. P., Beglar D. (2007). A vocabulary size test. The Language Teacher, 31(1), 9–13.
Nation P. (1993a). Using dictionaries to estimate vocabulary size: Essential, but rarely followed, procedures. Language Testing, 10(1), 27–40.
Nation P. (1993b). Vocabulary size, growth, and use. In Schreuder R., Weltens B. (Eds.), The bilingual lexicon (pp. 115–135). Amsterdam: John Benjamins.
Nation P. (2012). Measuring vocabulary size in an uncommonly taught language. International Conference on Language Proficiency Testing in the Less Commonly Taught Languages, Bangkok, Thailand,
Ouellette G. P. (2006). What’s meaning got to do with it: The role of vocabulary in word reading and reading comprehension. Journal of Educational Psychology, 98(3), 554–566.
Perfetti C., Hart L. (2002). The lexical quality hypothesis. Precursors of Functional Literacy, 11, 67–86.
Pregel D., Rickheit G. (1987). Der Wortschatz im Grundschulalter: Häufigkeitswörterbuch zum verbalen, substantivischen und adjektivischen Wortgebrauch. Hildesheim: Georg Olms Verlag.
R Core Team. (2015). R: A language and environment for statistical computing: R foundation for statistical computing. Retrieved from http://www.R-project.org/
Reif M. (2014). PP. Retrieved from https://github.com/manuelreif/PP
Rizopoulos D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
Ruoff A. (1981). Häufigkeitswörterbuch gesprochener Sprache: Gesondert nach Wortarten, alphabethisch, rückläufig alphabetisch und nach Häufigkeit geordnet. Berlin: Walter de Gruyter.
Schmid H., Fitschen A., Heid U. (2004). SMOR: A German computational morphology covering derivation, composition, and inflection. Proceedings of the IVth International Conference on Language Resources and Evaluation (LREC 2004), 1263–1266.
Schroeder S., Würzner K.-M., Heister J., Geyken A., Kliegl R. (2015). childLex: A lexical database of German read by children. Behavior Research Methods. Advance online publication.
Seashore R. H., Eckerson L. D. (1940). The measurement of individual differences in general English vocabularies. The Journal of Educational Psychology, 31(1), 14–38.
Segbers J., Schroeder S. (forthcoming). A vocabulary size test for primary school children in German.
Smith M. K. (1941). Measurement of the size of general English vocabulary through the elementary grades and high school. Genetic Psychology Monographs, 24, 311–345.
Tannenbaum K. R., Torgesen J. K., Wagner R. K. (2006). Relationships between word knowledge and reading comprehension in third-grade children. Scientific Studies of Reading, 10(4), 381–398.
Topping K. (2015). What kids are reading: The book-reading habits of students in British schools 2015. London: Renaissance Learning.
Williams K. T. (2007). Expressive Vocabulary Test, 2nd ed.: EVT-2. London: Pearson.

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: April 28, 2016
Issue published: July 2017

Keywords

  1. Language corpus
  2. language development
  3. lexicon size estimation
  4. vocabulary development
  5. vocabulary test

Rights and permissions

© The Author(s) 2016.
Request permissions for this article.

Authors

Affiliations

Jutta Segbers
Max Planck Institute for Human Development, Germany
Sascha Schroeder
Max Planck Institute for Human Development, Germany

Notes

Jutta Segbers, Max Planck Institute for Human Development, MPRG Reading Education and Development (REaD), Lentzeallee 94, 14195 Berlin, Germany. Email: [email protected]

Metrics and citations

Metrics

Journals metrics

This article was published in Language Testing.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 2798

*Article usage tracking started in December 2016


Altmetric

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores



Articles citing this one

Receive email alerts when this article is cited

Web of Science: 32 view articles Opens in new tab

Crossref: 26

  1. Variability in Receptive Language Development Following Bilateral Coch...
    Go to citation Crossref Google Scholar
  2. An intelligent vocabulary size measurement method for second language ...
    Go to citation Crossref Google Scholar
  3. Old Dogs and New Tricks: Assessing Idiom Knowledge Amongst Native Spea...
    Go to citation Crossref Google Scholar
  4. A Role for Visual Memory in Vocabulary Development: A Systematic Revie...
    Go to citation Crossref Google Scholar
  5. Effects of Target Age and Genre on Morphological Complexity in Childre...
    Go to citation Crossref Google Scholar
  6. Vocabulary size estimates for Lithuanian native speakers
    Go to citation Crossref Google Scholar
  7. A Systematic Review of Chinese Character Size Tests From 1930 to 2021
    Go to citation Crossref Google Scholar
  8. Hören und Kognition im Kindesalter
    Go to citation Crossref Google Scholar
  9. Glottalizing at word junctures: Exploring bidirectional transfer in ch...
    Go to citation Crossref Google Scholar
  10. A comparative study on the effects of a VR and PC visual novel game on...
    Go to citation Crossref Google Scholar
  11. Zipf’s law revisited: Spoken dialog, linguistic units, parameters, and...
    Go to citation Crossref Google Scholar
  12. Nonverbal supports for word learning: Prekindergarten teachers’ gestur...
    Go to citation Crossref Google Scholar
  13. A voice without a mouth no more: The neurobiology of language and cons...
    Go to citation Crossref Google Scholar
  14. What it takes to make a word (token)
    Go to citation Crossref Google Scholar
  15. Developmental Trajectories in the Understanding of Everyday Uncertaint...
    Go to citation Crossref Google Scholar
  16. Does morphological structure modulate access to embedded word meaning ...
    Go to citation Crossref Google Scholar
  17. Morphological decomposition supports word recognition in primary schoo...
    Go to citation Crossref Google Scholar
  18. Morphological Priming in Children: Disentangling the Effects of School...
    Go to citation Crossref Google Scholar
  19. Reading habits and emotional vocabulary in adolescents
    Go to citation Crossref Google Scholar
  20. Survey on the Usefulness of Word Concepts in Estimating the Understand...
    Go to citation Crossref Google Scholar
  21. Index of environmental awareness through the MIMIC approach
    Go to citation Crossref Google Scholar
  22. WOR-TE: Ein Ja / Nein-Wortschatztest für Kinder verschiedene...
    Go to citation Crossref Google Scholar
  23. Orthographic Networks in the Developing Mental Lexicon. Insights From ...
    Go to citation Crossref Google Scholar
  24. Development of unfamiliar accent comprehension continues through adole...
    Go to citation Crossref Google Scholar
  25. Introduction to the Clinical Forum: Working Memory in School-Age Child...
    Go to citation Crossref Google Scholar
  26. How Many Words Do We Know? Practical Estimates of Vocabulary Size Depe...
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:


Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.

View options

PDF/ePub

View PDF/ePub

Full Text

View Full Text