Data on adolescents’ mental health (COPSY and BELLA)
We make use of a unique, German survey on adolescents’ mental health, the so-called BELLA cohort study. BELLA forms part of the nationwide, longitudinal, representative German National Health Interview and Examination Survey for Children and Adolescents, also referred to as KiGGs [see also (
20,
33,
45)]. Specifically, we rely on (i) the last pre-pandemic wave of BELLA, conducted with
n = 1480 11- to 17-year-old adolescents between August 2015 and November 2017 in the form of computer adaptive tests, and (ii) the COVID-19 online issue of the survey, also referred to as the COPSY [see also (
19)], conducted with 1040 11- to 17-year-olds between 26 May and 10 June 2020. In addition, we draw upon data from surveys with
n = 1040 parents of 11- to 17-year-old adolescents, which were also conducted online between 26 May and 10 June 2020. All participants gave informed consent, and the study was approved by the Local Psychological Ethics Committee of the University of Hamburg (LPEK-0151).
To measure well-being and mental health, we rely on internationally accepted, validated, and comparable measures that are in accordance with the guidelines of the International Consortium for Health Outcomes Measurement (
34,
46). Specifically, we use the KIDSCREEN-10 Index and the HBSC-SCL to measure adolescents’ well-being and psychosomatic complaints. The KIDSCREEN-10 Index is based on the KIDSCREEN-52 and constitutes a global measure for HRQoL. It is computed using the responses on a five-point Likert scale (from “never” to “always” or from “not at all” to “extremely”) with 10 questions capturing information, e.g., “Have you felt fit and well during the previous week?”. The KIDSCREEN-10 Index is developed according to the item response theory (international
T values based on RASCH modeling) (
35,
47). The second scale, the HBSC-SCL, contains eight questions assessing the frequency of psychosomatic complaints (e.g., headache and nervousness) within the past week on a five-point response scale (from “not at all” to “daily”) (
36). We further draw upon three clinical scales. The SDQ provides information about emotions, behaviors, and relationships regarding children and adolescents during the previous week. It contains, in total, 20 items divided in four subscales on emotional (e.g., “Many worries, often seems worried”), conduct (e.g., “Often lies or cheats”), hyperactivity (e.g., “Constantly fidgeting or squirming”), and peer problems (e.g., “Often fights with other children or bullies them”), each providing three response options from “not true” to “certainly true” (
37). We further use the CES-DC to examine depressive symptoms. This scale is generated on the basis of seven items (e.g., “I felt sad”) with frequency during the previous week scored on a scale of 0 (= “not at all”) to 3 (= “a lot”) (
38,
48). Last, we rely on the nine-item generalized anxiety subscale of the German SCARED. Here, adolescents are asked to score statements such as “I am nervous” with three response options (from 0 = “not true or hardly ever true” to 2 = “very true or often true”) (
39,
49). All scales stem from the youth survey except the SDQ provided by the parental questionnaire.
To identify the effects on parental mental health and FC, we use two scales stemming from the parental questionnaire. First, we rely on the PHQ-8 measuring parental depressive disorder. This scale summarizes eight statements (on a four-response scale from “not at all” to “nearly every day”) regarding personal health (e.g., “Feeling down, depressed, or hopeless,” “Troubles falling asleep or staying asleep, or sleeping too much,” and “Feeling tired or having little energy”) (
40). Second, we draw upon the parent-reported FC scale assessing family climate with four statements as “In our family, everybody cares about each other’s worries” on a four-response scale from “not true” to ‘exactly true” (
41).
Merged dataset (COPSY, pre-pandemic wave of BELLA, and school closure data)
We can merge the self-compiled dataset on school closure (
22) with the COPSY and BELLA data via adolescents’ state, grade level, and school track. The resulting COPSY sample contains in
n = 907 11- to 17-year-old adolescents [age mean, 14.2; SD, 1.8; see table S4, representative for German youth (132 observations are dropped because of missing information on the state and/or school track)]. Interviews took place while some of the adolescents were still at home. Because we lack the exact survey date, we take the start date of COPSY, 26 May 2020, to impute a conservative measure of individual duration of school closure for these cases. Using this imputation method, adolescents in our sample experience school closure lasting at least 4.7 and at most 10.1 weeks. Using an alternative imputation method, the end date of COPSY, 10 June 2020, to define the individual duration of school closures, results in a maximum duration of 12.3 weeks. The resulting BELLA sample contains
n = 1334 11- to 17-year-old adolescents [age mean, 13.8; SD,1.7; see table S5, representative for German youth (155 observations are dropped because of missing information)]. The KIDSCREEN-10 and the CES-DC are standardized to mean of 0 and SD of 1 in BELLA. As the remaining three measures are only elicited in COPSY (but not in BELLA), we standardized them to mean of 0 and SD of 1 in COPSY.
Crisis helpline call data
We also rely on data from the “Kinder- und Jugendtelefon,” a dedicated phone helpline service for children and adolescents, operated by the nonprofit organization Nummer gegen Kummer e.V. The service is supported by Deutsche Telekom AG, with additional funding provided by the German Federal Ministry for Family Affairs, Senior Citizens, Women and Youth, as well as by the European Union and the Stiftung Deutsche Kinder-, Jugend- und Elterntelefone. The helpline is free of charge; calls are answered from Monday to Saturday between 2 p.m. and 8 p.m. by around 3200 trained volunteer counselors. We have access to all calls entering between January 2019 and December 2020 that developed into deeper conversations and counseling (
50). The helpline guarantees anonymity to their callers, and it is impossible to identify callers from conversation-level data that we have at hand. However, callers are informed that anonymous call data are collected for reporting and statistical purposes, explicitly in the terms and conditions and implicitly in annual reports and online publications. Further information is available online at
www.nummergegenkummer.de.
Information on detailed, nonexclusive conversation topics allows us to track the importance of problems among the vulnerable population of callers. Counselors report the age of callers if stated during the conversation or provide an estimate, allowing us to approximate the most likely grade level for each caller. Together with information on the location of the receiving helpline center, we link the call data with our data on school closure by the federal state and the approximated grade level. Because we do not have information on the school track, we use the average school closure for the respective grade level. We focus on calls by callers of ages 11 to 17 to increase the comparability with the previous analyses based on survey data. The overall sample amounts to n = 126,006 calls of 11- to 17-year-old adolescents, of which 51,833 are informative about the reasons why adolescents are calling (the remaining calls relate mainly to unspecified psychosocial and health issues; for more details, see table S6).
Data on further pandemic measures
We consider the following pandemic measures (
30). These measures are all measures with meaningful variation until 25 May 2020 across states. We disregard pandemic measures with negligible variation until 25 May 2020 across states, such as travel restrictions, mask or test mandates, work from home recommendations or other workplace restrictions, curfews, and distancing rules.
• Private spaces: The mildest restriction on contacts and gatherings in private spaces is a recommendation to avoid contacts. More stringent versions of this measure impose a maximum number of people gathering in private spaces.
• Public spaces: The mildest restriction on contact and gatherings in public spaces is a recommendation to avoid contacts in public spaces. More stringent versions of this measure impose a maximum number of people gathering in public spaces.
• Indoor events: The mildest level of restriction restricts public indoor events to a maximum of 1000 people. The increasing levels reduce the maximum number of people until the highest level of restriction bans any public indoor event.
• Outdoor events: The mildest level of restriction restricts public outdoor events to a maximum of 5000 people. The increasing levels reduce the maximum number of people until the highest level of restriction bans any public outdoor event.
• Institutions: The mildest level of restriction imposed on educational and cultural institutions involves explicit hygiene rules. The intermediate levels restrict the maximum number of people, allow only outdoor institutions to open, restrict the sale of drinks and food, or ban any institution to open except for museums. The highest level of restriction bans any institution to open.
• Retail and wholesale: The mildest level of restriction imposes hygiene rules. The intermediate levels restrict opening hours or ban large shops to open. The highest level of restriction bans any noncritical retail and wholesale to open.
• Gastronomy: The mildest level of restriction imposes hygiene rules. The intermediate levels restrict opening hours, ban indoor consumption, allow only to-go, and require an appointment, or a combination thereof. The highest level of restriction bans any gastronomy.
• Services and crafts: The mildest level of restriction imposes hygiene rules. The intermediate levels restrict any services with unavoidable customer contact with exceptions for hair saloons or health and care services. The highest level of restriction bans any services and crafts.
• Nightlife: The mildest level of restriction imposes hygiene rules. The intermediate levels ban clubs but not bars to open. The highest level of restriction bans any nightlife venue to open.
• Accommodations: The mildest level of restriction imposes hygiene rules. The intermediate levels ban the accommodation of tourists. The highest level bans any accommodation.
• Indoor sports: The mildest level of restriction bans tournaments with spectators. The intermediate levels restrict the maximum number of people or bans sports with physical contact. The highest level bans any indoor sports.
• Outdoor sports: The mildest level of restriction restricts the maximum number of people for outdoor sports. The intermediate levels ban outdoor sports with physical contact. The highest level bans any use of outdoor sport facilities.
Each pandemic measure has different levels. The definition of measures and their levels are taken from (
30). The stringency of each measure is calculated as follows: For each day, from 1 March to 25 May, all levels are summed up for each state and measured separately. Then, the population-weighted median level restriction is determined, and states are split accordingly. Similarly, we calculate how many people per capita have been infected until 25 May. Then, we calculate the population-weighted median. We do the very same for the number of deaths per capita caused by COVID-19. On both measures, the population-weighted median split assigns the same group of states in the high and low groups.
Short-run impact of school closures and their contribution to the overall deterioration of mental health
We use a linear regression model with two levels of fixed effects (
28,
29,
51), to identify the effect of school closures on adolescents’ mental health. The variable of interest, the individual duration of school closures, is perfectly determined by the combination of the individual state and the school track–specific grade level. This two-way fixed-effects method accounts for two levels of fixed effects, a set of state- and school track–specific grade level fixed effects, and thus absorbs any level differences between states and school track–specific grade levels in adolescents’ mental health. Identification is thus based on the remaining variation in the duration of school closures within the states (across school track–specific grade levels) and within school track–specific grade levels (across states), which is arguably exogenous and thus serves as a framework for a quasi-experiment. The underlying identifying assumption is that there are no systematic, confounding factors driving the deviation in adolescents’ mental health from the mental health predicted for any adolescents residing in state
s and attending the school track–specific grade level cx, other than the duration of school closure. This implies that neither the pandemic severity (e.g., infection rates, hospitalization, and death rates) nor further pandemic measures (e.g., contact restrictions and home office) vary within states across school track–specific grade levels (
30). This assumption is plausible as (i) case rates among adolescents were negligible (at least in the first wave of the pandemic) and case rates and deaths of parents or grandparents should be comparable across the age ranges of the children in our sample; and (ii) there are no further pandemic measures targeting explicitly specific age groups, grade levels, or schools (
30).
More formally, we model youth mental health using the following equation
where
yiscx constitutes the dependent variable, comprising the different measures of the mental health of individual
i that lives in state
s and attends school track
x in grade
c. We standardize all outcome variables to a mean of 0 and SD of 1, facilitating the comparison across different mental health dimensions and the interpretation of the effect size. The independent variable
dscx denotes weeks of school closure applying to all individuals residing in state
s, attending grade level
c in school track
x. We include a constant α as well as state (γ
s) and school track–specific grade level (γ
cx) fixed effects. We further control for adolescents’ age (in years) and sex (using a dummy = 1 if female) summarized by the matrix
Xi. ɛ
iscx represents an idiosyncratic error term. All estimates shown in
Fig. 2A and tables S7A and S8 result from estimation equation (see
Eq. 1) using ordinary least square and clustering the SEs at the state*school track*grade level and thus on treatment level (
52).
The following example of two neighboring states helps illustrating the idea underlying this empirical approach: Bavaria gave priority to entry grade levels with the higher grade levels following only subsequently. Thus, in Bavaria, a fifth grader (the entry grade level in secondary school) returned to school by 18 May 2020, while a sixth grader returned only by 15 June 2020 (4 weeks later). In contrast, the neighboring state Baden-Württemberg reopened schools for all lower grade levels in secondary schools (fifth, sixth, seventh, and eighth graders) “en bloc” on 15 June 2020. Comparing fifth and sixth graders in Bavaria nets out any state-specific effects of the pandemic and its related measures and thus leaves us with the mental health differences not only because of the additional 4 weeks school closures but also possibly because of age differences in mental health and in the way how differently aged adolescents dealt with the pandemic and its measures. Comparing fifth and sixth graders in Baden-Württemberg, in turn, allows determining the age differences in mental health during the COVID-19 pandemic, again, net of any state-specific effects of the pandemic or its related measures. The double comparison isolates the effect of the four additional weeks school closure net of state-, grade-, and school track–specific differences in adolescents’ mental health during the COVID-19 pandemic.
When drawing on both datasets, the COPSY data and pre-pandemic wave of BELLA, we exploit the different time periods and rely on the following two specifications
where
yiscxt constitutes the dependent variable, comprising the different measures of the mental health of individual
i in state
s in grade
c that attends school track
x at time
t (which can take two values: pre-pandemic or during the pandemic). The independent variable
dscx denotes the weeks of school closure mandated during the pandemic in the state where child
i resides and for the grade level and school track that child
i attends. Note that this only reflects the state that child
i lives in as well as the school track and the grade level that he/she attends. Whether a child suffered from the mandated weeks of school closure depends on whether the child is observed before or after the outbreak of the pandemic (and thus whether the observation belongs to BELLA or COPSY data).
ct is a dummy that is 1 for any observation belonging to COPSY and, thus, when schools were closed and is 0 for any observation belonging to BELLA. We further control for adolescents’ age (in years) and sex (using a dummy = 1 if female) summarized by the matrix
Xit and include a constant α. ɛ
iscxt represents an idiosyncratic error term.
Using this additional estimation model, we learn about the following parameters: The effect of one additional week of school closure is given by β
3. The overall COVID-19 effect net of school closures is given by β
2. β
1 captures preexisting level differences in adolescents’ mental health and to which extent they may be correlated with the mandated weeks of school closures. As such, the estimated β
1 provides us with some insights whether our identifying assumption—the weeks of school closures is independent of children’s mental health and as such ability to cope mentally with the school closures—applies. The results of this specification are shown in table S7C, and β
2 is presented as red bars in
Fig. 2B. The results for the overall deterioration in the various mental health measures over the pandemic (resulting from estimating
Eq. 2 but dropping
di and
ct × di) are shown in table S7B and as blue bars in
Fig. 2B.
Sensitivity checks
To check the sensitivity of our baseline results, we run a series of alternative specifications presented in table S8 (B to H) We first list the baseline results resulting from estimating
Eq. 1 (table S8A). The remaining panels show the estimates resulting of the various robustness checks regarding our baseline specification. First, we use a more parsimonious approach and exclude all individual control variables contained in the vector (table S8B). In table S8C, we aim at absorbing any level differences in adolescents’ mental health across school tracks within states and thus control for a fully interacted set of state and school track fixed effects (instead of a set of state fixed effects only). In table S8D, we include the second-order polynomial of weeks of school closure as a further covariate in
Eq. 1 to allow for any nonlinear effects. In table S8 (E and F), we reconsider the duration of school closures. In table S8E, we use the survey end date (instead of the survey start date) to impute the duration of school closure for all adolescents that had not returned to school before 26 May 2020 (the start of COPSY). In table S8F, we adjust the duration of school closure for any school holidays taking place during the lockdown. Here, we subtract the weeks of vacations from the duration of school closure to calculate the weeks of school closure. In table S8G, we use parental reports on whether their adolescent child had returned to school or still lingered in homeschooling instead of the mandated weeks of closures based on individual state of residence, grade level, and school track. For this sensitivity check, we rely on a dummy variable taking the value of 1 if teaching takes place mainly or exclusively at home. Following the recommendation to use both self-reported and externally evaluated answers to mental health scales (
53,
54), we reestimate
Eq. 1 using parental reports on adolescents’ mental health as the dependent variable (see table S8H). We can do so only for the KIDSCREEN-10 index and HBSC-SCL scale, as the parental questionnaire does not contain the further screening devices for mental health problems. For comparability, we restrict the sample to parents reporting on their adolescent children only (age interval of 11 to 17). In table S8I, we assess whether our results are driven by seasonality in pre-pandemic data when estimating
Eq. 3. For that, we include dummies for the specific quarter in which the interviews were conducted. In table S8J, we address the concern of possible time trends in HRQoL or CES-DC by including a linear time trend (measured in years).
Sensitivity checks including pandemic severity and stringency of pandemic measures
To estimate the effect of state-level pandemic severity and stringency of pandemic measures, we need to deviate from our baseline equation (see
Eq. 1) and sacrifice controlling for the set of state dummies (as these absorb anything that is constant at the state level). We therefore estimate the correlated random effects model in
Eq. 4, following the Chamberlain-Mundlak approach. This approach keeps dummy variables for school track–specific grade level (Γ
cx) and control variables for adolescents’ age (in years) and sex (using a dummy = 1 if female) summarized by the matrix
Xi. The Chamberlain-Mundlak approach adds the average state-level weeks of school closure
d̄scx, the average grade-track dummies by state Γ
̄scx, and the average state-level age and gender
X̄scx. The advantage of this approach is that it allows us to control for additional state-level variables. Note that the underlying idea of this approach resembles the idea of the two-way fixed-effects approach shown in
Eq. 1 as both rely on netting out state-level averages of all independent variables and are identified by deviations from the state-level mean. As such, the resulting estimates for the effect of weeks of school closure β are comparable across the two approaches (see tables S7A and S9A).
We then go on and add consecutively dummies indicating the state-level stringency in a series of pandemic measures (please refer to table S3 for a classification of the states in the various pandemic measures). To do so, we include a dummy variable ms which is equal to 1 when the respective state has high restrictions (at and above the median level) on a certain pandemic measure and 0 otherwise. The estimates for λ can be interpreted as the effect the respective pandemic measure has on youth mental health above and beyond the effect of prolonged school closures.
Subgroup analysis
For the subgroup analysis in
Fig. 3 and table S10, we adapt the baseline model of
Eq. 1 and add interaction terms between the weeks of school closures and dummy variables for the respective subgroups. Note that the respective main effects are already included in
Xi. To examine the effect of school closures by adolescents’ age (shown in
Fig. 3A and table S10A), we add the interaction terms between weeks of school closure
dscx and a full set of age dummies (μ
ij, which is equal to 1 if the individual
i is aged
j and 0 otherwise). Note that this model (see
Eq. 5) allows us to measure the effect of weeks of school closure on mental health for each age group
j separately.
In
Fig. 3B and table S10B, we include the interaction of weeks of school closure and the female dummy μ
if as well as the interaction of weeks of school closure and the male dummy μ
im. This allows us to identify the effect of weeks of school closure on mental health for boys (β
m) and girls (β
f) separately
Last, we show the effect by living space per school-aged child (see
Fig. 3C and table S10C). For this purpose, we divide the size of the apartment or house in square meters by the number of children in the household. Then, we do a median split and create two dummy variables: μ
ia that is equal to 1 if a child has a living space above the median and 0 otherwise; and μ
ib that is equal to 1 if a child has a living space below the median and 0 otherwise. We then estimate an augmented model where we add these two dummy variables indicating living space per-school age child as well as their interactions with the mandated weeks of school closure to the baseline equation (see
Eq. 7). This allows us to identify the effect of weeks of school closure on mental health for children having a lot of space (β
a) and children having less space (β
b) separately
Crisis helpline call data analysis
To account for the autocorrelation of the error term in helpline calls, we estimate the following model
where
yt is the total duration of calls at day
t for a certain group and topic. μ
i is a set of dummies indicating 14 day windows where μ
1 starts on Monday, 30 December 2019, and ends on Sunday, 12 January 2020, and μ
22 starts on Monday, 19 October 2020, and ends on Sunday, 1 November 2020. The dummies γ
m, γ
sat, and γ
sun are the same as in
Eq. 8.
yt−j is a 7-day moving average, and θɛ
t−1 allows for an autoregressive process of order 1. All β
i estimates are presented in table S12, and most β
i estimates are shown in fig. S3. We abstain from reporting longer periods, as the infection rates started increasing rapidly in the autumn, resulting in a high number of local school closures and, ultimately, in the second phase of nationwide school closures, rendering an analysis of the mental health effects caused by the initial school closures going beyond 2020.