Using large language models in psychology

Demszky, Dorottya; Yang, Diyi; Yeager, David S.; Bryan, Christopher J.; Clapper, Margarett; Chandhok, Susannah; Eichstaedt, Johannes C.; Hecht, Cameron; Jamieson, Jeremy; Johnson, Meghann; Jones, Michaela; Krettek-Cobb, Danielle; Lai, Leslie; JonesMitchell, Nirel; Ong, Desmond C.; Dweck, Carol S.; Gross, James J.; Pennebaker, James W.

doi:10.1038/s44159-023-00241-5

Perspective
Published: 13 October 2023

Using large language models in psychology

Dorottya Demszky ORCID: orcid.org/0000-0002-6759-9367¹^na1,
Diyi Yang ²^na1,
David S. Yeager ORCID: orcid.org/0000-0002-8522-9503^3,4^na1,
Christopher J. Bryan^3,5,
Margarett Clapper ORCID: orcid.org/0000-0002-8932-6116^3,4,
Susannah Chandhok⁶,
Johannes C. Eichstaedt ORCID: orcid.org/0000-0002-3220-2972^7,8,
Cameron Hecht ORCID: orcid.org/0000-0003-4842-6003^3,4,
Jeremy Jamieson⁹,
Meghann Johnson³,
Michaela Jones³,
Danielle Krettek-Cobb⁶,
Leslie Lai⁶,
Nirel JonesMitchell³,
Desmond C. Ong^3,4,
Carol S. Dweck⁷,
James J. Gross ORCID: orcid.org/0000-0003-3624-3090⁷ &
…
James W. Pennebaker⁴

Nature Reviews Psychology volume 2, pages 688–701 (2023)Cite this article

18k Accesses
9 Citations
61 Altmetric
Metrics details

Subjects

Abstract

Large language models (LLMs), such as OpenAI’s GPT-4, Google’s Bard or Meta’s LLaMa, have created unprecedented opportunities for analysing and generating language data on a massive scale. Because language data have a central role in all areas of psychology, this new technology has the potential to transform the field. In this Perspective, we review the foundations of LLMs. We then explain how the way that LLMs are constructed enables them to effectively generate human-like linguistic output without the ability to think or feel like a human. We argue that although LLMs have the potential to advance psychological measurement, experimentation and practice, they are not yet ready for many of the most transformative psychological applications — but further research and development may enable such use. Next, we examine four major concerns about the application of LLMs to psychology, and how each might be overcome. Finally, we conclude with recommendations for investments that could help to address these concerns: field-initiated ‘keystone’ datasets; increased standardization of performance benchmarks; and shared computing and analysis infrastructure to ensure that the future of LLM-powered research is equitable.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Examples of LLM functionality.**

**Fig. 2: Pre-training, fine-tuning and prompt-tuning of LLMs.**

Leveraging large language models for predictive chemistry

Article Open access 06 February 2024

Languages with more speakers tend to be harder to (machine-)learn

Article Open access 28 October 2023

Language statistics as a window into mental representations

Article Open access 16 May 2022

References

Pennebaker, J. W., Mehl, M. R. & Niederhoffer, K. G. Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54, 547–577 (2003).
Article PubMed Google Scholar
Stone, P. J., Dunphy, D. C. & Smith, M. S. The General Inquirer: A Computer Approach to Content Analysis Vol. 651 (MIT Press, 1966).
Landauer, T. K., Foltz, P. W. & Laham, D. An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998).
Article Google Scholar
Landauer, T. K. & Dumais, S. T. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997).
Article Google Scholar
Choudhury, M. D., Gamon, M., Counts, S. & Horvitz, E. Predicting depression via social media. In Proc. Int. AAAI Conf. Web Social Media 7, 128–137 (2013).
Eichstaedt, J. C. et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol. Sci. 26, 159–169 (2015).
Article PubMed Google Scholar
Boyd, R. L. & Schwartz, H. A. Natural language analysis and the psychology of verbal behavior: the past, present, and future states of the field. J. Lang. Soc. Psychol. 40, 21–41 (2021).
Article PubMed Google Scholar
Bhatia, S. & Aka, A. Cognitive modeling with representations from large-scale digital data. Curr. Dir. Psychol. Sci. 31, 207–214 (2022).
Article Google Scholar
Boyd, R., Ashokkumar, A., Seraj, S. & Pennebaker, J. The Development and Psychometric Properties of LIWC-22 (Univ. Texas at Austin, 2022).
Blei, D., Ng, A. & Jordan, M. Latent Dirichlet allocation. In Advances in Neural Information Processing Systems (eds Dietterich, T., Becker, S. & Ghahramani, Z.) Vol. 14 (MIT Press, 2001).
Brown, T. et al. in Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) Vol. 331, 877–1901 (Curran Associates, Inc., 2020).
OpenAI. GPT-4 technical report. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.08774 (2023).
Collins, E. & Ghahramani, Z. LaMDA: our breakthrough conversation technology. Google https://blog.google/technology/ai/lamda/ (2021).
Wittgenstein, L. Tractatus Logico-Philosophicus (Edusp, 1994).
Wallace, J. Only in the context of a sentence do words have any meaning. Midw. Stud. Phil. 2, 144–164 (1977).
Article Google Scholar
Eliot, L. People are eagerly consulting generative AI ChatGPT for mental health advice, stressing out AI ethics and AI law. Forbes https://www.forbes.com/sites/lanceeliot/2023/01/01/people-are-eagerly-consulting-generative-ai-chatgpt-for-mental-health-advice-stressing-out-ai-ethics-and-ai-law/ (2023).
ChatGPT used by teachers more than students, new survey from Walton Family Foundation Finds. Walton Family Foundation https://www.waltonfamilyfoundation.org/chatgpt-used-by-teachers-more-than-students-new-survey-from-walton-family-foundation-finds (2023).
Ziegler, D. M. et al. Fine-tuning language models from human preferences. Preprint at arXiv https://doi.org/10.48550/arXiv.1909.08593 (2020).
Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).
Google Scholar
Weiss, M., Brock, T., Sommo, C., Rudd, T. & Turner, M. C. Serving community college students on probation: four-year findings from Chaffey College’s Opening Doors Program. MDRC https://eric.ed.gov/?id=ED526395 (2011).
Crum, A. J., Akinola, M., Martin, A. & Fath, S. The role of stress mindset in shaping cognitive, emotional, and physiological responses to challenging and threatening stress. Anxiety Stress Coping 30, 379–395 (2017).
Article PubMed Google Scholar
Yeager, D. S. et al. A synergistic mindsets intervention protects adolescents from stress. Nature 607, 512–520 (2022).
Article PubMed PubMed Central Google Scholar
Crum, A. J., Salovey, P. & Achor, S. Rethinking stress: the role of mindsets in determining the stress response. J. Pers. Soc. Psychol. 104, 716–733 (2013).
Article PubMed Google Scholar
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conf. on Fairness, Accountability, and Transparency 610–623 (ACM, 2021).
Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl Acad. Sci. 120, e2218523120 (2023).
Article PubMed PubMed Central Google Scholar
Landi, H. Doximity rolls out beta version of ChatGPT tool for docs aiming to streamline administrative paperwork. Fierce Healthcare https://www.fiercehealthcare.com/health-tech/doximity-rolls-out-beta-version-chatgpt-tool-docs-aiming-streamline-administrative (2023).
Liu, X. et al. P-Tuning: prompt tuning can be comparable to fine-tuning across scales and tasks. In Proc. 60th Ann. Meet. Assoc. Computational Linguistics Vol. 2, 61–68 (Association for Computational Linguistics, 2022).
Liu, P. et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55, 1–35 (2023).
Google Scholar
Argyle, L. et al. Out of one, many: using language models to simulate human samples. Polit. Anal. 31, 337–351 (2023).
Article Google Scholar
Plaza-del-Arco, F. M., Martín-Valdivia, M.-T. & Klinger, R. Natural language inference prompts for zero-shot emotion classification in text across corpora. In Proc. 29th Int. Conf. on Computational Linguistics 6805–6817 (International Committee on Computational Linguistics, 2022).
Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B. & Yang, Q. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proc. 2023 CHI Conf. on Human Factors in Computing Systems 1–21 (Association for Computing Machinery, 2023).
Park, J. S. et al. Social simulacra: creating populated prototypes for social computing systems. In 35th Ann. ACM Symp. on User Interface Software and Technology 1–18 (Association for Computing Machinery, 2022).
Aher, G. V., Arriaga, R. I. & Kalai, A. T. Using large language models to simulate multiple humans and replicate human subject studies. In Proc. 40th Int. Conf. on Machine Learning 337–371 (PMLR, 2023).
Mahowald, K. et al. Dissociating language and thought in large language models: a cognitive perspective. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.06627 (2023).
Trott, S., Jones, C., Chang, T., Michaelov, J. & Bergen, B. Do large language models know what humans know? Cogn. Sci. 47, e13309 (2023).
Article PubMed Google Scholar
Sap, M., Le Bras, R., Fried, D. & Choi, Y. Neural theory-of-mind? On the limits of social intelligence in large LMs. In Proc. 2022 Conf. on Empirical Methods in Natural Language Processing 3762–3780 (Association for Computational Linguistics, 2022).
Marcus, G. & Davis, E. GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about. MIT Technology Review https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/ (2020).
Marcus, G. & Davis, E. Large language models like ChatGPT say the darnedest things. The Road to AI We Can Trust https://garymarcus.substack.com/p/large-language-models-like-chatgpt (2023).
OpenAI. GPT-4 Technical Report (2023).
Novikova, J., Dušek, O., Curry, A. C. & Rieser, V. Why we need new evaluation metrics for NLG. In Proc. 2017 Conf. on Empirical Methods in Natural Language Processing 2231–2240 (2017).
Luo, F. et al. Towards fine-grained text sentiment transfer. In Proc. 57th Ann. Meet. Assoc. Computational Linguistics 2013–2022 (Association for Computational Linguistics, 2019).
Lord, S. P., Sheng, E., Imel, Z. E., Baer, J. & Atkins, D. C. More than reflections: empathy in motivational interviewing includes language style synchrony between therapist and client. Behav. Ther. 46, 296–303 (2015).
Article PubMed Google Scholar
Schuetz, A. Scheler’s theory of intersubjectivity and the general thesis of the alter ego. Phil. Phenomenol. Res. 2, 323–347 (1942).
Article Google Scholar
Fiske, S. T. Interpersonal stratification: status, power, and subordination. In Handbook of Social Psychology 5th edn, Vol. 2 (eds Fiske, S. T., Gilbert, D. T. & Lindzey, G.) 941–982 (John Wiley & Sons, 2010).
Lai, V., Zhang, Y., Chen, C., Liao, Q. V. & Tan, C. Selective explanations: leveraging human input to align explainable AI. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.09656 (2023).
Petty, R. E. & Cacioppo, J. T. The elaboration likelihood model of persuasion. In Advances in Experimental Social Psychology Vol. 19 (ed. Berkowitz, L.) 123–205 (Academic Press, 1986).
Karinshak, E., Liu, S. X., Park, J. S. & Hancock, J. T. Working with AI to persuade: examining a large language model’s ability to generate pro-vaccination messages. Proc. ACM Hum. Comput. Interact. 7, 116 (2023).
Article Google Scholar
Gross, J. J. Antecedent- and response-focused emotion regulation: divergent consequences for experience, expression, and physiology. J. Pers. Soc. Psychol. 74, 224–237 (1998).
Article PubMed Google Scholar
Harris, C., Halevy, M., Howard, A., Bruckman, A. & Yang, D. Exploring the role of grammar and word choice in bias toward African American English (AAE) in hate speech classification. In 2022 ACM Conf. on Fairness, Accountability, and Transparency 789–798 (Association for Computing Machinery, 2022).
Barocas, S., Hardt, M. & Narayanan, A. Fairness and Machine Learning: Limitations and Opportunities https://fairmlbook.org/ (MIT Press, in the press).
Blodgett, S. L., Barocas, S., Daumé III, H. & Wallach, H. Language (technology) is power: a critical survey of ‘bias’ in NLP. Preprint at arXiv http://arxiv.org/abs/2005.14050 (2020).
Brady, W. J., Jackson, J. C., Lindström, B. & Crockett, M. Algorithm-mediated social learning in online social networks. Preprint at OSFPreprints https://doi.org/10.31219/osf.io/yw5ah (2023).
Gaddis, S. M. An introduction to audit studies in the social sciences. In Audit Studies: Behind the Scenes with Theory, Method, and Nuance (ed. Gaddis, S. M.) 3–44 (Springer International Publishing, 2018).
Lucy, L. & Bamman, D. Gender and representation bias in GPT-3 generated stories. In Proc. Third Worksh. on Narrative Understanding 48–55 (Association for Computational Linguistics, 2021).
Gonen, H. & Goldberg, Y. Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1, 609–614 (Association for Computational Linguistics, 2019).
Cheryan, S. & Markus, H. R. Masculine defaults: identifying and mitigating hidden cultural biases. Psychol. Rev. 127, 1022–1052 (2020).
Walton, G. M., Murphy, M. C. & Ryan, A. M. Stereotype threat in organizations: implications for equity and performance. Annu. Rev. Organ. Psychol. Organ. Behav. 2, 523–550 (2015).
Article Google Scholar
Monarch, R. Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-Centered AI (Simon and Schuster, 2021).
Schick, T., Udupa, S. & Schütze, H. Self-diagnosis and self-debiasing: a proposal for reducing corpus-based bias in NLP. Trans. Assoc. Comput. Linguist. 9, 1408–1424 (2021).
Article Google Scholar
Bai, Y. et al. Constitutional AI: harmlessness from AI feedback. Preprint at arXiv https://doi.org/10.48550/arXiv.2212.08073 (2022).
Chang, E. H. et al. The mixed effects of online diversity training. Proc. Natl Acad. Sci. 116, 7778–7783 (2019).
Article PubMed PubMed Central Google Scholar
Lai, C. K. et al. Reducing implicit racial preferences: I. A comparative investigation of 17 interventions. J. Exp. Psychol. Gen. 143, 1765–1785 (2014).
Article PubMed Google Scholar
Allen, N. B., Nelson, B. W., Brent, D. & Auerbach, R. P. Short-term prediction of suicidal thoughts and behaviors in adolescents: can recent developments in technology and computational science provide a breakthrough? J. Affect. Disord. 250, 163–169 (2019).
Article PubMed PubMed Central Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-graber, J. L. & Blei, D. M. Reading tea leaves: how humans interpret topic models. Adv. Neural Inf. Process. Syst. 22, 288–296 (2009).
Google Scholar
Demszky, D., Liu, J., Hill, H. C., Jurafsky, D. & Piech, C. Can automated feedback improve teachers’ uptake of student ideas? Evidence from a randomized controlled trial in a large-scale online course. Educ. Eval. Policy Anal. https://doi.org/10.3102/01623737231169270 (2023).
Shah, R. S. et al. Modeling motivational interviewing strategies on an online peer-to-peer counseling platform. Proc. ACM Hum. Comput. Interact. 6, 1–24 (2022).
Article Google Scholar
Demszky, D. & Liu, J. M-Powering teachers: natural language processing powered feedback improves 1:1 instruction and student outcomes. In Proc. Tenth ACM Conf. on Learning @ Scale 59–69 (Association for Computing Machinery, 2023).
Aronson, E. The power of self-persuasion. Am. Psychol. 54, 875–884 (1999).
Article Google Scholar
Walton, G. M. & Wilson, T. D. Wise interventions: psychological remedies for social and personal problems. Psychol. Rev. 125, 617–655 (2018).
Article PubMed Google Scholar
Walton, G. M. & Cohen, G. L. A brief social-belonging intervention improves academic and health outcomes of minority students. Science 331, 1447–1451 (2011).
Article PubMed Google Scholar
Yeager, D. S. et al. A national experiment reveals where a growth mindset improves achievement. Nature 573, 364–369 (2019).
Article PubMed PubMed Central Google Scholar
Wang, P., Chan, A., Ilievski, F., Chen, M. & Ren, X. PINTO: faithful language reasoning using prompt-generated rationales. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.01562 (2022).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’: explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).
Manning, C. D., Clark, K., Hewitt, J., Khandelwal, U. & Levy, O. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proc. Natl Acad. Sci. 117, 30046–30054 (2020).
Article PubMed PubMed Central Google Scholar
Simonsohn, U., Nelson, L. D. & Simmons, J. P. P-curve: a key to the file-drawer. J. Exp. Psychol. Gen. 143, 534–547 (2013).
Article PubMed Google Scholar
Messick, S. Validity of psychological assessment: validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Am. Psychol. 50, 741–749 (1995).
Article Google Scholar
Judd, C. M., Westfall, J. & Kenny, D. A. Treating stimuli as a random factor in social psychology: a new and comprehensive solution to a pervasive but largely ignored problem. J. Pers. Soc. Psychol. 103, 54–69 (2012).
Article PubMed Google Scholar
Wang, T. et al. Metrics for peer counseling: triangulating success outcomes for online therapy platforms. In Proc. 2023 CHI Conf. on Human Factors in Computing Systems 1–17 (ACM, 2023).
Nook, E. C., Hull, T. D., Nock, M. K. & Somerville, L. H. Linguistic measures of psychological distance track symptom levels and treatment outcomes in a large set of psychotherapy transcripts. Proc. Natl Acad. Sci. 119, e2114737119 (2022).
Article PubMed PubMed Central Google Scholar
Voigt, R. et al. Language from police body camera footage shows racial disparities in officer respect. Proc. Natl Acad. Sci. 114, 6521–6526 (2017).
Article PubMed PubMed Central Google Scholar
Paullada, A., Raji, I. D., Bender, E. M., Denton, E. & Hanna, A. Data and its (dis)contents: a survey of dataset development and use in machine learning research. Patterns 2, 100336 (2021).
Article PubMed PubMed Central Google Scholar
Wang, A. et al. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In Proc. 2018 EMNLP Worksh. BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP 353–355 (Association for Computational Linguistics, 2018).
Gehrmann, S. et al. The GEM benchmark: natural language generation, its evaluation and metrics. In Proc. 1st Worksh. on Natural Language Generation, Evaluation and Metrics (GEM 2021) 96–120 (Association for Computational Linguistics, 2021).
Birhane, A. & Guest, O. Towards decolonising computational sciences. Preprint at arXiv http://arxiv.org/abs/2009.14258 (2020).
Birhane, A. Algorithmic injustice: a relational ethics approach. Patterns 2, 100205 (2021).
Article PubMed PubMed Central Google Scholar
Erscoi, L., Kleinherenbrink, A. & Guest, O. Pygmalion displacement: when humanising AI dehumanises women. Preprint at SocArXiv https://doi.org/10.31235/osf.io/jqxb6 (2023).
Guest, O. & Martin, A. E. On logical inference over brains, behaviour, and artificial neural networks. Comput. Brain Behav. 6, 213–227 (2023).
Article Google Scholar
Llorens, A. et al. Gender bias in academia: a lifetime problem that needs solutions. Neuron 109, 2047–2074 (2021).
Article PubMed PubMed Central Google Scholar
Metz, C. & Weise, K. Microsoft bets big on the creator of ChatGPT in race to dominate AI. The New York Times (12 January 2023).
Tesfagergish, S. G., Kapočiūtė-Dzikienė, J. & Damaševičius, R. Zero-shot emotion detection for semi-supervised sentiment analysis using sentence transformers and ensemble learning. Appl. Sci. 12, 8662 (2022).
Article Google Scholar
ElSherief, M. et al. Latent hatred: a benchmark for understanding implicit hate speech. In Proc. 2021 Conf. on Empirical Methods in Natural Language Processing 345–363 (Association for Computational Linguistics, 2021).
Pryzant, R. et al. Automatically neutralizing subjective bias in text. Proc. AAAI Conf. Artif. Intell. 34, 480–489 (2020).
Google Scholar
Ophir, Y., Tikochinski, R., Asterhan, C. S. C., Sisso, I. & Reichart, R. Deep neural networks detect suicide risk from textual Facebook posts. Sci. Rep. 10, 16685 (2020).
Article PubMed PubMed Central Google Scholar
Basta, C., Costa-jussà, M. R. & Casas, N. Evaluating the underlying gender bias in contextualized word embeddings. In Proc. First Worksh. on Gender Bias in Natural Language Processing 33–39 (Association for Computational Linguistics, 2019).
Ashokkumar, A. & Pennebaker, J. W. Social media conversations reveal large psychological shifts caused by COVID-19’s onset across US cities. Sci. Adv. 7, eabg7843 (2021).
Article PubMed PubMed Central Google Scholar
Rathje, S. et al. GPT is an effective tool for multilingual psychological text analysis. Preprint at PsyArXiv https://psyarxiv.com/sekf5/ (2023).
Seraj, S., Blackburn, K. G. & Pennebaker, J. W. Language left behind on social media exposes the emotional and cognitive costs of a romantic breakup. Proc. Natl Acad. Sci. 118, e2017154118 (2021).
Article PubMed PubMed Central Google Scholar
Sap, M. et al. Quantifying the narrative flow of imagined versus autobiographical stories. Proc. Natl Acad. Sci. 119, e2211715119 (2022).
Article PubMed PubMed Central Google Scholar
Michelmann, S., Kumar, M., Norman, K. A. & Toneva, M. Large language models can segment narrative events similarly to humans. Preprint at arXiv http://arxiv.org/abs/2301.10297 (2023).
Zhang, S., She, J. S., Gerstenberg, T. & Rose, D. You are what you’re for: essentialist categorization in large language models. In Proc. Ann. Meet. Cognitive Science Society Vol. 45 (2023).
Cimpian, A. & Salomon, E. The inherence heuristic: an intuitive means of making sense of the world, and a potential precursor to psychological essentialism. Behav. Brain Sci. 37, 461–480 (2014).
Article PubMed Google Scholar
Portelance, E., Degen, J. & Frank, M. C. Predicting age of acquisition in early word learning using recurrent neural networks. In Proc. Ann. Meet. Cognitive Science Society (2020).
Westerveld, M. F. & Roberts, J. M. A. The oral narrative comprehension and production abilities of verbal preschoolers on the autism spectrum. Lang. Speech Hear. Serv. Sch. 48, 260–272 (2017).
Article PubMed Google Scholar
Siddaway, A. P., Wood, A. M. & Hedges, L. V. How to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annu. Rev. Psychol. 70, 747–770 (2019).
Article PubMed Google Scholar
Tipton, E., Pustejovsky, J. E. & Ahmadi, H. Current practices in meta-regression in psychology, education, and medicine. Res. Synth. Meth. 10, 180–194 (2019).
Article Google Scholar
Aher, G., Arriaga, R. I. & Kalai, A. T. Using large language models to simulate multiple humans and replicate human subject studies. Preprint at arXiv http://arxiv.org/abs/2208.10264 (2023).
Pennycook, G. et al. Shifting attention to accuracy can reduce misinformation online. Nature 592, 590–595 (2021).
Article PubMed Google Scholar
Brady, W. J., Wills, J. A., Burkart, D., Jost, J. T. & Van Bavel, J. J. An ideological asymmetry in the diffusion of moralized content on social media among political leaders. J. Exp. Psychol. Gen. 148, 1802–1813 (2019).
Article PubMed Google Scholar
Milkman, K. L. et al. Megastudies improve the impact of applied behavioural science. Nature 600, 478–483 (2021).
Article PubMed PubMed Central Google Scholar
Stade, E. et al. Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/cuzvr (2023).
Jacobs, J. et al. Promoting rich discussions in mathematics classrooms: using personalized, automated feedback to support reflection and instructional change. Teach. Teach. Educ. 112, 103631 (2022).
Article Google Scholar
Hunkins, N., Kelly, S. & D’Mello, S. “Beautiful work, you’re rock stars!”: teacher analytics to uncover discourse that supports or undermines student motivation, identity, and belonging in classrooms. In LAK22: 12th Int. Learning Analytics and Knowledge Conf. 230–238 (ACM, 2022).
Schwartz, H. A. et al. DLATK: Differential Language Analysis ToolKit. In Proc. 2017 Conf. on Empirical Methods in Natural Language Processing: System Demonstrations 55–60 (Association for Computational Linguistics, 2017).
Liu, Y. et al. RoBERTa: a robustly optimized BERT pretraining approach. Preprint at arXiv https://doi.org/10.48550/arXiv.1907.11692 (2019).
Walton, G. M. et al. Where and with whom does a brief social-belonging intervention promote progress in college? Science 380, 499–505 (2023).
Article PubMed Google Scholar
Clapper, M. et al. Evaluating LLM’s generation of growth-mindset supportive language in middle years math. Artificial intelligence in education. In Proc. Worksh. on Equity, Diversity, and Inclusion in Educational Technology Research and Development 24th Int. Conf. on Artificial Intelligence in Education (2023).
Hecht, C. A., Yeager, D. S., Dweck, C. S. & Murphy, M. C. Beliefs, affordances, and adolescent development: lessons from a decade of growth mindset interventions. Adv. Child. Dev. Behav. 61, 169–197 (2021).
Article PubMed PubMed Central Google Scholar
Hecht, C. A., Dweck, C. S., Murphy, M. C., Kroeper, K. M. & Yeager, D. S. Efficiently exploring the causal role of contextual moderators in behavioral science. Proc. Natl Acad. Sci. 120, e2216315120 (2023).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Science Foundation under award numbers 1761179 and 2201928 (PI: D.S.Y.), by the National Institutes of Health under award numbers R01HD084772 (PI: D.S.Y.) and P2CHD042849 (Population Research Center), and by the William and Melinda Gates Foundation under awards INV-047751 and INV-004519 (PI: D.S.Y.). This work was also supported by an Advanced Research Fellowship from the Jacobs Foundation to D.S.Y., and the Institute for Human-Centered A.I. at Stanford University to J.C.E. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies. The authors also thank C. Smith for creating the original version of the figures included with the original submission. The glossary definitions were generated by GPT-4 in May 2023 and edited by the authors.

Author information

These authors contributed equally: Dorottya Demszky, Diyi Yang, David S. Yeager.

Authors and Affiliations

Graduate School of Education, Stanford University, Stanford, CA, USA
Dorottya Demszky
Department of Computer Science, Stanford University, Stanford, CA, USA
Diyi Yang
Texas Behavioral Science and Policy Institute, University of Texas at Austin, Austin, TX, USA
David S. Yeager, Christopher J. Bryan, Margarett Clapper, Cameron Hecht, Meghann Johnson, Michaela Jones, Nirel JonesMitchell & Desmond C. Ong
Department of Psychology, University of Texas at Austin, Austin, TX, USA
David S. Yeager, Margarett Clapper, Cameron Hecht, Desmond C. Ong & James W. Pennebaker
Department of Business, Government, and Society, University of Texas at Austin, Austin, TX, USA
Christopher J. Bryan
Google, LLC, Mountain View, CA, USA
Susannah Chandhok, Danielle Krettek-Cobb & Leslie Lai
Department of Psychology, Stanford University, Stanford, CA, USA
Johannes C. Eichstaedt, Carol S. Dweck & James J. Gross
Institute for Human-Centered AI, Stanford University, Stanford, CA, USA
Johannes C. Eichstaedt
Department of Psychology, University of Rochester, Rochester, NY, USA
Jeremy Jamieson

Authors

Dorottya Demszky

View author publications

You can also search for this author in PubMed Google Scholar
Diyi Yang

View author publications

You can also search for this author in PubMed Google Scholar
David S. Yeager

View author publications

You can also search for this author in PubMed Google Scholar
Christopher J. Bryan

View author publications

You can also search for this author in PubMed Google Scholar
Margarett Clapper

View author publications

You can also search for this author in PubMed Google Scholar
Susannah Chandhok

View author publications

You can also search for this author in PubMed Google Scholar
Johannes C. Eichstaedt

View author publications

You can also search for this author in PubMed Google Scholar
Cameron Hecht

View author publications

You can also search for this author in PubMed Google Scholar
Jeremy Jamieson

View author publications

You can also search for this author in PubMed Google Scholar
Meghann Johnson

View author publications

You can also search for this author in PubMed Google Scholar
Michaela Jones

View author publications

You can also search for this author in PubMed Google Scholar
Danielle Krettek-Cobb

View author publications

You can also search for this author in PubMed Google Scholar
Leslie Lai

View author publications

You can also search for this author in PubMed Google Scholar
Nirel JonesMitchell

View author publications

You can also search for this author in PubMed Google Scholar
Desmond C. Ong

View author publications

You can also search for this author in PubMed Google Scholar
Carol S. Dweck

View author publications

You can also search for this author in PubMed Google Scholar
James J. Gross

View author publications

You can also search for this author in PubMed Google Scholar
James W. Pennebaker

View author publications

You can also search for this author in PubMed Google Scholar

Contributions

Lead authors D.D., D.Y. and D.S.Y. (equal contributions, listed alphabetically) conceived the paper, outlined and wrote the first draft, guided the co-authoring process, provided critical edits, conceived and supervised the creation of the figures, boxes and tables, and finalized the submitted version of the manuscript. Senior authors C.S.D., J.J.G. and J.W.P. (listed alphabetically) assisted in the outlining, organization, and conceptualization of the manuscript, boxes and tables and provided multiple rounds of critical edits. J.C.E. assisted with outlining the paper, wrote first drafts of key sections and edited the draft. All other authors assisted with the empirical examples, the conceptualization of the key arguments and conclusions in the paper and provided critical edits.

Corresponding authors

Correspondence to Dorottya Demszky, Diyi Yang or David S. Yeager.

Ethics declarations

Competing interests

J.W.P. is the CEO of Pennebaker Conglomerates, a company that sells natural language processing software and services. S.C. and L.L. are employees of Google LLC, which owns LLM technology. D.K.-C. was formerly an employee at Google LLC.

Peer review

Peer review information

Nature Reviews Psychology thanks William Brady and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Backpropagation: An algorithmic technique that utilizes a reverse pass through the network to calculate the contribution of each parameter to the prediction error and adjust them accordingly to improve performance.
Bag-of-words methods: A text representation technique that counts the frequency of words in a document, disregarding grammar and word order, such as the Linguistic Inquiry and Word Count algorithm.
Generative pre-trained transformer: A family of large language models developed by OpenAI and usually trained on massive datasets to generate contextually coherent text.
Machine learning: A subset of artificial intelligence that involves teaching computers to learn patterns and make decisions from data without explicit programming.
Neural network: A computational model inspired by the structure and function of biological neural networks used for tasks such as pattern recognition, classification and prediction.
Training data: The dataset used to train a machine learning model, consisting of input–output pairs that help the model to learn the underlying patterns and relationships.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Demszky, D., Yang, D., Yeager, D.S. et al. Using large language models in psychology. Nat Rev Psychol 2, 688–701 (2023). https://doi.org/10.1038/s44159-023-00241-5

Download citation

Accepted: 12 September 2023
Published: 13 October 2023
Issue Date: November 2023
DOI: https://doi.org/10.1038/s44159-023-00241-5