Norms and T-scores for screeners of alcohol use, depression and anxiety in the population of Suriname

de Beurs, Edwin; Jadnanansing, Raj; Etwaroo, Kajal; Blankers, Matthijs; Bipat, Robbert; Peen, Jaap; Dekker, Jack

doi:10.3389/fpsyt.2023.1088696

ORIGINAL RESEARCH article

Front. Psychiatry, 27 April 2023

Sec. Psychological Therapy and Psychosomatics

Volume 14 - 2023 | https://doi.org/10.3389/fpsyt.2023.1088696

This article is part of the Research Topic Psychometrics in Psychiatry 2022: Psychological Therapy and Psychosomatics View all 6 articles

Norms and T-scores for screeners of alcohol use, depression and anxiety in the population of Suriname

Edwin de Beurs^1,2^†^*^†

Raj Jadnanansing³^†^†

Kajal Etwaroo³^†

Matthijs Blankers^2,4,5^†

Robbert Bipat³^†

Jaap Peen²^†

Jack Dekker^2,6^†

¹Department of Clinical Psychology, Leiden Universiteit, Leiden, Netherlands
²Research department, Arkin Mental Health Care, Amsterdam, Netherlands
³Department of Psysiology, Anton de Kom University, Tammenga, Suriname
⁴Department of Psychiatry, Amsterdam University Medical Center, Amsterdam, Netherlands
⁵Trimbos Institute, Utrecht, Netherlands
⁶Department of Clinical Psychology, Vrije Universiteit, Amsterdam, Netherlands

Background: There is a considerable gap between care provision and the demand for care for common mental disorders in low-and-middle-income countries. Screening for these disorders, e.g., in primary care, will help to close this gap. However, appropriate norms and threshold values for screeners of common mental disorders are lacking.

Methods: In a survey study, we gathered data on frequently used screeners for alcohol use disorders, (AUDIT), depression, (CES-D), and anxiety disorders (GAD-7, ACQ, and BSQ) in a representative sample from Suriname, a non-Latin American Caribbean country. A stratified sampling method was used by random selection of 2,863 respondents from 5 rural and 12 urban resorts. We established descriptive statistics of all scale scores and investigated unidimensionality. Furthermore, we compared scores by gender, age-group, and education level with t-test and Mann–Whitney U tests, using a significance level of p < 0.05.

Results: Norms and crosswalk tables were established for the conversion of raw scores into a common metric: T-scores. Furthermore, recommended cut-off values on the T-score metric for severity levels were compared with international cut-off values for raw scores on these screeners.

Discussion: The appropriateness of these cut-offs and the value of converting raw scores into T-scores are discussed. Cut-off values help with screening and early detection of those who are likely to have a common mental health disorder and may require treatment. Conversion of raw scores to a common metric in this study facilitates the interpretation of questionnaire results for clinicians and can improve health care provision through measurement-based care.

Background

Common Mental Disorders (CMD’s), such as depression, anxiety, and Alcohol Use Disorders (AUD) are highly prevalent worldwide (1). Global year prevalence rates are about 4.7% for depressive disorders (2), about 7.3% for anxiety disorders (3) and about 5.0% for AUDs (4). These common mental disorders are significantly associated with impairment of quality of life, lower social functioning, and high societal costs (5). The 12-month prevalence of estimates of major depression, anxiety and AUDs are about the same in high-income as in Low-to-Middle-Income Countries (LMICs) (3, 6).

In recent decades, empirically supported psychological treatments have been developed for these CMDs with are highly efficacious and efficient (7). In a meta-analysis, these empirically psychological treatments for depression and anxiety were also effective in LMICs (8). Therefore, global dissemination of these interventions in LMICs is advocated by the WHO. However, in LMICs, the availability of these treatments is limited. Chisholm et al. (9) estimated intervention coverage in LMICs as 14 and 10% for depression and anxiety respectively, corresponding to a treatment gap of 86 to 90% for these disorders. According the World Health Organization, the treatment gap for mental disorders is 30–50% in developed countries and 76–80% in LMICs (10).

A major obstacle to widespread use of these short screeners is that norms and cut-off values for potential “caseness” for these screeners for the population of Suriname have not been determined. An additional problem is that each instrument has its own scale, norm scores and cut-off value. Direct comparisons of scores on dimensions such as depression, anxiety, and alcohol dependence are not easily done. In addition to calculating the norm values for the Surinamese population, we have also established crosswalks (conversion tables and a figure) to a common metric for these instruments, the T-score, applicable in this target group.

The traditional way to provide normative values for measurement instruments is to compose for each measure a set of tables for various groups of respondents (distinct by clinical status, gender, or age) with raw score ranges and their meaning in levels from very low to very high (see Table 1 in the present paper). In addition, data on clinically meaningful cut-off scores are provided, such as a cut-off score for clinical level or “caseness” and a cut-off score for reliable change, aka the Reliable Change Index (11). However, more and more we see an international trend towards scoring measures on a common metric, usually a standardized score. This allows researchers to gather and compare data from various studies more efficiently. For clinicians, such a common metric is convenient to interpret scores from various outcome scales more easily and relay information from test scores to their patients. The T-score has been chosen by the PROMIS group as the common metric and several papers have been published with cross-walk tables for frequently used measures of depression (12), anxiety (13), and psychological distress (14).

TABLE 1

Table 1. Norms for all respondents, males, and females.

In sum, the study aim is two-fold: we provide normative data for the AUDIT, CES-D, GAD-7, ACQ, and BSQ for the Surinamese population. Thus, we generated age- and gender-specific normative data for these five measures. Secondly, based on Item Response Theory models for their scoring, we made crosswalks (tables and figures) to convert scores on these measures into a common metric (normalized T-scores) and we established formulas to convert raw scores into T-scores. Thus, we aim to facilitate the interpretation of test results in research and clinical practice.

Methods

Participants

The study was conducted in two districts of Suriname, Paramaribo (the capital of Suriname), a predominantly urban district, and Nickerie, a predominantly rural district. Respondents were recruited by the census bureau of Suriname. A stratified sampling method was used by random selection of respondents from 12 resorts of Paramaribo and 5 resorts of Nickerie, assuring a balanced geographical distribution of respondents. There were 2,863 participants in the study (15), 1837 respondents with an urban background (Paramaribo 1,065 women and 772 men) and 1,026 participants with a rural background (Nickerie, 593 female and 433 male). All questionnaire data were collected by trained interviewers. We refer for more details on the study to Jadnanansing et al. (15). Table 2 provides demographic data for the participants.

TABLE 2

Table 2. Demographic data for all respondents and for the urban and rural samples.

Instruments

For all measurement instruments Dutch language versions were used.

Alcohol abuse and dependence: AUDIT

The Alcohol Use Disorders Identification Test AUDIT (16) was developed by the World Health Organization to screen for problematic alcohol use. The AUDIT is a 10-item screening test to assess alcohol consumption, drinking behavior and drinking related problems. Items are scored on a 5-point Likert frequency scale 0 “Not at all” to 4 “Daily or almost daily.” The total score (AUDIT_TOT) has a theoretical range of 0 to 40. The cut-off score used for increased risk for problematic alcohol consumption is 8 (17). Based on a considerable number of studies, Peng and colleagues concluded that the AUDIT comprises two factors, alcohol consumption (AUDIT_USE, items 1–3) and symptoms of alcohol dependence and problem consequences from drinking (AUDIT_PRO, items 4–10).

Depression: the CES-D

The Center for Epidemiological Studies-Depression (CES-D) scale was designed to measure the level of depressive symptomatology in the general population (18). Twenty items inquire about the frequency symptoms that occurred in the past week with response options from 0 “Not at all” to 3 “Nearly every day.” The total score ranges between 0 to 60 and the cut-off point that has been typically recommended for depression “caseness” is 16. However, more recently 20 has also been recommended as cut-off value (19).

Anxiety: GAD-7 and ACQ/BSQ

Two aspects common to anxiety were measured: generalized anxiety or excessive worry and fear of fear. Generalized anxiety and worry was measured with the GAD-7 (20). This is a widely used measure, recommended by the International Consortium of Health Outcome Measurements (ICHOM) for treatment outcome measurement in anxiety disorders Furthermore, it is routinely administered in all “Improving Access to Psychological Therapies” (IAPT) services in the UK (21). The GAD-7 comprises seven items describing feelings, such as “Trouble relaxing,” “Feeling nervous, anxious or on edge” and “Feeling afraid as if something awful might happen.” Items are scored on a 4-point Likert frequency scale (0 “Not at all” to 3 “Nearly every day”), resulting in a theoretical range in scores of 0 to 21. Kroenke et al. (22) suggested as cut-off scores for mild, moderate, and severe anxiety symptoms: 5, 10, and 15. When the GAD-7 is applied for screening, further evaluation is recommended when the score is 10 or higher (20).

Fear of fear was measured with the Agoraphobic Cognitions Questionnaire [ACQ; (23)] and the Body Sensations Questionnaire [BSQ; (23)]. The ACQ was devised to measure maladaptive thoughts about the possible consequences of panic (the cognitive aspect). On 14 items respondents rate the frequency of these thoughts when feeling anxious or frightened. Each item is rated on a 5-point Likert frequency scale, ranging from 1 “Thought never occurs” to 5 “Thought always occurs.” Next to a total score (ACQ_TOT), the ACQ measures two factors: ACQ_SC for social/behavioral concerns (e.g., loss of control, acting foolishly) and ACQ_PHY for physical concerns (e.g., having a heart attack, fainting). The scale discriminates well between patients and normal controls: Chambless et al. (23) reported a mean score of M = 2.32 (SD = 0.66) for outpatients with agoraphobia, and M = 1.60 (SD = 0.46) for a community sample. The BSQ measures fear of the bodily sensations which are commonly experienced during anxiety and panic attacks. The BSQ comprises 17 items, each describing a physical symptom, such as dizziness, palpitations, or breathlessness. Items are rated on a five-point scale for how much anxiety they provoke ranging from 1 “Not at all” to 5 “Extremely.” Chambless (23) reported a mean score of M = 3.05 (SD = 0.86) for outpatients with agoraphobia, and M = 1.80 (SD = 0.59) for a community sample. The Dutch version of the ACQ and BSQ have been psychometrically evaluated by Arrindell (24) and appeared reliable (internal consistency Cronbach’s α > 0.82 and α > 0.89 for the ACQ and BSQ, respectively) and had good test–retest reliability (Pearson PMC r > 0.79 for both scales.

Statistical analysis

We calculated descriptive statistics, such as the mean, standard deviation, skewness, and kurtosis of all scale scores for the sample and compared scores by gender, age group, and urban or rural background with independent samples t-tests and Mann Whitney U tests. We also compared mean scores of the Suriname respondents to community samples of other countries/cultures. We established norm tables in order to give meaning to scale scores. Finally, we established for all scales cross-walk tables to convert raw scores to a common metric: T-scores (25). These T-scores were based on theta’s from IRT models. All analyses were performed with R. We used the mirt package of R, version 1.33.2 (26) to determine relevant item characteristics and to calculate scale scores (factor score) from item responses. We used the “Graded Response Model for polytomous items” with the Expected-A-Posteriori sum score (EAPsum) as estimator, in accordance with the approach chosen by the PROMIS group and proposed by Fischer and Rose (27). We used the mirt package to assess the fit of a unidimensional model for each (sub)scale that was analyzed.

We evaluated uniform and nonuniform DIF (28) for gender, age (recoded into a binary variable <45 and ≥45, and urbanicity (urban vs. rural). Both types of DIF were assessed with ordinal logistic regression (OLR) methods (29) using the R package lordif Version 0.3-3 (30). As measure of effect size, we used the change in McFadden’s pseudo R², lavaan (version 06.5; (31)). We used the (scaled) fit statistics and set as requirements for unidimensionality the following the suggestion of 0.02 as critical value for rejecting the hypothesis of no DIF (30). We examined for each scale the fit of the graded response model with inspection of item parameters estimates using the item fit signed chi-square (S-χ²) statistic (32) as indicator of item misfit. Items with a S-χ² p < 0.001 are considered to have a poor fit in the IRT model. The assumption of monotonicity was evaluated by examining graphs of item mean scores as a function of rest scores (total raw score minus the item score) using the R package Mokken (Version 23.0.6; (33). In addition, we evaluated the accompanying scalability coefficients (Mokken’s H) for the full scale and the individual items. Mokken’s H was interpreted as follows: 0.30 ≤ H < 0.40 low quality, 0.40 ≤ H < 0.50 moderate quality, and H ≥ 0.50 high quality (Mokken, 1971). Also, we investigated local independence (LID). Item pairs are locally independent when items show no association after controlling for the trait level. Investigation of LID was done with Yen’s Q3 statistic (34) in mirt.

Finally, we created a cross-walk table and cross-walk figure to convert raw scores to IRT-based T-sores. We also established equations for this conversion with regression analysis (curve fitting). Linear, polynomial, exponential, logarithmic, power, rational, sigmoid, and hyperbolic equations (and exponential, logarithmic, and power equations with an added linear term) were fitted with Nonlinear Least Squares (nls and nls2) of the R package nlstools (version 2.0-0) (35). We compared the fit of these various equation by their Bayesian Information Criterion (BIC) value for each scale. The procedure is described in more detail and cross-validated by de Beurs et al. (36).

Results

We first checked whether the samples of the included respondents were comparable to the populations of the two areas concerning gender and age. The gender distribution in the general population of Paramaribo (N = 140,679) is 51% women and 49% man; 53% is younger than 40 years old and 47% is 40 years or older. In the present sample a significantly different gender distribution was obtained: 58% women (χ² = 35.6; df = 1; p < 0.001) and 54% of the respondents is ≥40 years (χ² = 34.9; df = 1; p < 0.001). In Nickerie (N = 34,233) the gender distribution is 47% women and 53% men; 52% is younger than 40 years and 48% is 40 years or older. The gender distribution in the present sample was: 59% women (χ² = 46.9; df = 1; p < 0.001) and 54% of the respondents ≥40 years (χ² = 13.3; df = 1; p < 0.001). Thus, elderly women were somewhat overrepresented in our samples.

Table 2 presents an overview of demographic characteristics of the participants. There were differences between the urban sample and the rural sample in marital status, number of children ethnic background and work status, in age, educational level difference, but among the subsamples there were no differences in representation of the genders, with in both subsamples an equal overrepresentation of elderly women.

Next, we compared the scores obtained in the current Surinamese sample with other normative sample from the USA, Germany, and the Netherlands. Peng et al. (17) offer means for the AUDIT based on an analysis of AUDIT data from 15 countries. Lowe et al. (37) provided normative data for the GAD-7 from a large sample of the German population. In addition, we use data for the GAD-7 from a USA African-American sample reported in the study of Parkerson et al. (38). Bouwman et al. (39) collected CES-D data in a substantial Dutch population-based sample. Chambless obtained data on the ACQ and BSQ from a small sample of females (n = 21); Craske et al. (40) collected ACQ data from a student sample N = 173); de Beurs (41) obtained data from a representative sample of the Dutch general population n = 438, of which 263 were females, 60.0%).

Table 3 present mean scores (and SD’s) on the instruments from Surinamese respondents and from normative samples from the USA, the Netherlands and Germany. We compared mean scores in Table 3 by inspection only, as most means will differ statistically, given the large sample sizes. Scores of Surinamese respondents are lower regarding the total score for Alcohol abuse and dependence, use, and problems with alcohol (AUDIT) compared to the rest of the world according to the data of Peng et al. (17), especially among women. Depression scores are lower compared to the USA, but higher than in the Netherlands. Scores on the GAD are somewhat elevated compared to the German normative sample and comparable to the US. Scores on the ACQ and BSQ are similar to a Dutch normative sample, but lower compared to respondents from the USA. The anxiety scores on the GAD-7 are substantially elevated compared to the German population. Fear of fear according to the ACQ and fear of body sensations according to the BSQ is similar to the Netherlands, but lower compared to USA samples. Finally, the data reveal a substantial difference between men and women in problematic alcohol consumptions (males > females, as well as in depression and anxiety (females > males). The data also suggest a larger gender difference in Suriname compared to USA and European samples.

TABLE 3

Table 3. Means and standard deviations on the instruments from Surinamese respondents and from other normative samples.

Table 4 presents mean scores, SD’s, skewness and kurtosis of scale scores from the current sample. Most instruments yielded skewed and peaked frequency distributions of scores, due to an excess of low scores, especially on the AUDIT and ACQ (zero score-inflation).

TABLE 4

Table 4. Means and standard deviations, range, kurtosis, and skewness of scores on the scales of Surinamese respondents.

Table 5 presents means by gender, age group and urban or rural background. We tested for differences with t-test and Mann–Whitney U tests, given the non-normal distribution of scores on some measures. There was a significant difference between men and women on all measures. Males scored higher on problematic alcohol use [t(2851) = 23.69, p < 0.001], but lower on depression and anxiety. Age groups also differed significantly on all measures, except on the CES-D [t(2851) = 0.36, p = 0.72] and the ACQ-TOT [t(2851) = 1.49, p = 0.14], with younger respondents having higher scores, across the board. Finally, the scores of the urban and rural background levels did not differ on most measures, except for small differences on the CES-D, ACQ physical concerns and BSQ with higher scores in the rural resorts. Most differences between means of subgroups were small, with the exception of the large gender difference on the AUDIT, and small to medium gender differences on the GAD-7, the ACQ, and the BSQ. These gender differences and differences between younger and older respondents justify the distinction of various groups for norms. Thus, we decided to provide separate norming tables for both genders and, in addition, we calculated T-scores separately for six age groups 16–19, 20–29, 30–39, 40–49, 50–59, 60 years and older.

TABLE 5

Table 5. Means and standard deviations for men and women, younger and older, and urban and rural respondents, results of t-test and Mann Whitney U tests and effect size (Cohen’s d).

Table 1 offers meaning to scores on the instruments by providing cut-off scores for seven norm levels: very low (the lowest 5%), low (the next 15%), below average (20%), average (20%), above average (20%), high (15%), very high (5%). Differentiation among low scores is hard on several instruments. This is especially the case with the AUDIT-problem score, as it allows only a distinction between a very high score and every level below it.

Next, we established T-scores based on theta’s from IRT models for the instruments. First, we investigated the fit of IRT models to obtain factor scores. We inspected for each scale the Limited information goodness of fit test statistic that mirt provides. Results of these analyses are summarized in Table 6 (M2 and additional fit indices). The signed chi-square (S-χ²) statistic was calculated as indicator of item misfit. Some items were found with a statistically significant S-χ² indicating poor item fit. Inspection of plots for item performance yielded satisfactory results. Plots for test information and empirical test plots (to check for unidimensionality) were inspected as well. Test were most informative in the theta = −0.5 to 2.5 range, which is due to high frequency of low scores in the present sample. The assumption of monotonicity was evaluated by examining graphs and we evaluated the accompanying scalability coefficients (Mokken’s H) for the full scale and the individual items. Most scales appeared to have low to moderate quality according to Mokken’s H. These results are presented in Table 6 as well.

TABLE 6

Table 6. Information on IRT model fit indices and item fit statistics for the measurement instruments.

Furthermore, uniform and nonuniform DIF was investigated for gender, age, urbanicity. Significant DIF was only found for the ACQ, where two items were flagged (items 4 and 9). Local independence (LID) of item pairs was investigated with Yen’s Q3 that mirt provides and this information is included in Table 6. As suggested by Smits et al. (42), model fit was evaluated with Cohen’s (43) rules of thumb to interpret effect size; Q3 values between 0.24 and 0.36 imply moderate deviations, Q3 values above 0.37 imply large deviations. For each scale only a few item pairs with a high Q3 value were found. Table 6 also shows the item pairs with the highest Q3 value for each instrument.

Finally, we established equations to calculate normalized T-scores, which are included in a note under Table 7. For most scales, cubic polynomial functions fitted best. We validated these formulas by investigating the correspondence between theta-based T-scores and calculated T-scores with intraclass correlation coefficients (all in the range of ICC = 0.97 to 0.99) and inspected Bland–Altman plots. Formulas to calculate T-scores for the genders and age groups were also established and can be obtained from EdeB. Finally, we established T-scores based on theta’s from IRT models for the instruments. First, unidimensionality of the factor structure of each subscale was investigated by comparing fit indices with the preset requirements (CFI > 0.95, RMSEA < 0.08, and SRMR < 0.06). Most scales showed adequate fit to a unidimensional model (see Table 6). Scales with fit indices that did not meet the criteria were: CES-D (SRMR = 0.071), ACQ_TOT (SRMR = 0.085), and the BSQ (CFI = 0.81, RMSEA = 0.085). For the CES-D this may be due to the four positively stated items in this questionnaire, as these were the items showing misfit according to item fit indicator signed chi-square (S-χ²) statistic (32).

TABLE 7

Table 7. Crosswalk table from raw scores to theta-based T-scores.

Figure 1 shows for all scales the correspondence between raw scores on the scales and T-scores. Table 7 can be used to convert raw scores of all measures and subscales into T-scores. Formulas to calculate these T-scores are included in the note under Table 7. Formulas to calculate T-scores for the genders and age groups can be obtained from EdeB.

FIGURE 1

Figure 1. Crosswalk from raw scale scores to T-scores for the AUDIT, CES-D, GAD-7, ACQ, and BSQ.

Figure 1 displays raw scores based on summed item scores on the scales and how they relate to the T-score metric. As can be seen in Figure 1, the original raw scores show a difference in interval width, which illustrates the non-normal distribution of these scores. After conversion to T-scores the score intervals become equally spaced on the Y-axis. Figure 1 can also be used to convert raw scores of all measures and subscales into T-scores. It is based on calculated T-scores, applying the formulas from the note under Table 7.

Discussion

Data from a large representative sample from the general population of Suriname were collected to compare scores with other populations (from the USA and Europe) and to obtain norms on commonly used measures for alcohol use, depression and anxiety. Generally, scores appeared comparable to what has been found with these instruments on other continents. Also, we found differences between men and women and between younger and older respondents, similar to what has been reported in the literature (44, 45). Men report higher and more problematic alcohol use, which is cross-culturally a consistent finding (46). However, at least in the USA, the gender gap is closing as the difference is smaller for later birth cohorts (47). In an exploratory analysis, we investigated the effect of gender and age conjointly and we did find a significant interaction effect (alcohol use diminishes with age faster for men compared to women), but the effect size of this interaction was rather small (η² = 0.007).

Surinamese women reported higher levels of depression and anxiety compared to men. Regarding depression, Stevenson and Wolfers (48) mentioned in a review on gender studies into well-being the apparent paradox that for women living conditions have improved over the last 30 years, but subjective well-being has declined, both in absolute numbers and relative to men. In line with their findings, we also found an effect of age, with the oldest age group scoring lower on the CES-D. Moreover, according to our findings, the gender gap was larger in Paramaribo than in Nickerie, because rural men tended to have elevated scores on the CES-D, bringing their score closer to the score of women. Regarding anxiety, men had lower scores and may indeed experience less anxiety, but this gender difference may have been amplified by a reporting bias: stereotypes and socialization make that men are less inclined to acknowledge experiencing fear or anxiety (44), especially when data are gathered by (female) interviewers. Further research is needed to explain these gender and age differences. However, the small to medium sized differences between men and women regarding depression and anxiety justify the use of distinct cut-off values for caseness, distinct norms, and distinct cross walk tables to T-scores. However, the user of test results should be aware that T-scores calibrated on such subgroups will no longer reveal any difference between these subgroups.

Based on expert judgement, the PROMIS group has provided guidelines for the interpretation of scores on the T-score metric and proposed the following cut-points on the T-score metric: <55 normal; 55–60 mild; 60–70 moderate; >70 severe (49). Scores of 55 and higher are reason for concern and above 60 a moderate severity level of depression or anxiety is reached. Application of these values coincides well with known cut-off values on the measures investigated in this study. The raw-scores and T-scores that were used to make Figure 1 correspond well with what is published in the research literature. If we inspect the scores on the depression scale, the cut-off of 16 for “caseness” on the CES-D corresponds to a cut-off of T = 56.2 in the USA sample (12) and a score of T = 58.6 in our sample. A comparison of raw scores and T-scores at cut-off values for the GAD-7 anxiety scale yields similar results. Schalet et al. (13) provided a crosswalk table for the GAD-7 based on the US community sample used to calibrate PROMIS-instruments and their values coincide well with our present findings. If we look at the correspondence of recommended cut-off points for “caseness,” a score of 10 on the GAD-7 corresponds to a T-score of 62.3 in the USA sample and a T-score of 61.7 in our present sample. These values on the T-score scale (61.7 and 58.6 for the GAD-7 and CES-D, respectively) also underscore the appropriateness of cut-off scores for “caseness” on the T-score metric as suggested by the PROMIS group. De Beurs and colleagues proposed to use 55 as cut-off point for “caseness” in the Netherlands (50). However, a more formal evaluation of cut-off values awaits further investigation and requires information on the clinical status of respondents. The present study did not collect such data.

Finally, comparison of the norms in Table 1 and the results after conversion of raw scores to T-scores reveal the increased informative value of T-scores. The information in Table 1 results from binning raw scores into seven categories (very low for the lowest 5%, low for the next 15%, below average for the next 20%, average for the next 40 to 60%, etc.) Thus, Table 1 gives meaning to scores in a categorized manner, basically converting raw scores to percentile scores and binning these in seven categories. In contrast, T-scores are continuous. For instance, norms on the ACQ for men allow us to distinguish only three levels: very high, high, and everyone else with a lower than high score; T-scores yield much more detailed information. On the other hand, this may also give rise to a false precision level. For instance, scores on the AUDIT (with a theoretical range of 0 to 40) are highly skewed to the right with many respondents obtaining the lowest possible score of 0 (34%) and most respondents have a score of 6 or lower (90%). Thus, only few respondents have high scores and most score in a broad category of very low to average. According to Figure 1 these respondents obtain T-scores from 46.5 to 60.5. Figure 1 also reveals that the evaluated measures are predominantly useful for the pathological range; as noted before, especially the ACQ and the BSQ do not distinguish well in the healthy range as these instruments assess aspects of anxiety mainly found in patients with panic disorder and evoke a 0-score from most community-based respondents, a phenomenon also known as zero inflation. This also explains the high T-score value for the lowest possible scores for most instruments. T-scores usually range from 20 to 80, which includes 99.7% of the cases when scores are distributed normally. However, application of these clinical measures in the general population yields T-scores in the range of 40 to 85 or higher.

Strength of the present study: a substantial number of respondents from the Surinamese population were included in the study, stratified to include urban as well as rural respondents, allowing us to establish norm for both genders and various age groups. A traditional approach to norming instruments was combined with a more modern IRT based conversion of scores to T-scores. This worked out well for most instruments.

Limitations: Some instruments scores were highly skewed and peaked due to the large proportion of respondents with the lowest possible score on these measures. This is a common finding when self-report instruments for psychopathological constructs are administered in population samples. The requirement of a normal distribution of scores for some of the statistical tests we used, such as t-test comparing subgroups in the sample, was not met on most scores. However, the nonparametric alternative statistical test (Mann Whitney U test) resulted in highly similar findings. Furthermore, as mentioned in the results section, some scales (e.g., the CES-D, ACQ_TOT, and BSQ) did not meet all the requirements of good fit of IRT modelling. When this is the case, revision of the item content or revision of the scoring of items or scales may be in order. Revising internationally established instruments would be beyond the scope of the present study. Nevertheless, due to insufficient fit of IRT models, the resulting factor scores may be biased. Alternative approaches to obtain T-scores should be considered, such as percentile rank score conversion (51) or regression-based norming (52).

Conclusion

In sum, the present findings illustrate that internationally used cut-of values on self-report measures for case finding (in score on the original metrics and on the T-score metric) are appropriate for the population of Suriname. For most instruments, cut-off values for caseness for raw scores correspond well to generally recommended cut-of values for T-scores. T-scores are a convenient way to express how extraordinary a raw test score is on a continuous scale with equal intervals and T-scores are recommended to be used as a common metric for test results. In future studies, additional screeners may be evaluated on their utility in the Suriname context, such as screeners for other substance use disorders, adult Attention Deficit and Hyperactivity Disorder, and PostTraumatic Stress Disorder, which may otherwise easily remain undetected. Furthermore, these measurement instruments should be employed for mental health triage and routine outcome monitoring. Finally, their application may stimulate dissemination and use of (guided) self-help eMental-Health applications on smart-phones. This may help to bridge the existing treatment gap in Suriname, where stigma around mental health problems is still widespread and resources for mental health care are scarce.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving human participants were reviewed and approved by Ministry of Public Health of Suriname (#VG2014-09). The patients/participants provided their written informed consent to participate in this study.

Author contributions

EB processed the data and drafted the manuscript. RJ co-designed the study, supervised the data acquisition in Nickerie, and critically reviewed the manuscript. KE collected the data and critically reviewed the manuscript. RB reviewed the study design and critically reviewed the manuscript. MB critically reviewed the manuscript. JP processed the data and critically reviewed the manuscript. JD supervised the whole study and critically reviewed the manuscript. All authors contributed to the article and approved the submitted version.

Funding

Funding for data collection for this study was obtained from the Dutch Ministry of Foreign Affairs under the project title “Dwarkasing R. and de Jonge M. (2014). Onderzoek naar alcoholgebruik, angst en depressieve klachten in Suriname, en aanbieden van zorg op maat en geïndiceerde e-mental health. Paramaribo, Amsterdam.”

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

ACQ-PHY, Agoraphobic cognitions questionnaire – physical concerns; ACQ-SC, Agoraphobic cognitions questionnaire – social/behavioral concerns; ACQ-TOT, Agoraphobic cognitions questionnaire – total score; AUD, Alcohol use disorders; AUDIT_PRO, Alcohol use disorders identification test, problems; AUDIT_TOT, Alcohol use disorders identification test, total score; AUDIT_USE, Alcohol use disorders identification test, use score; BSQ, Body sensations questionnaire; CES-D, Center for epidemiological studies-depression; CFI, Comparative fit index; CMD, Common mental disorders; EAP, Expected A-posteriori; GAD-7, Generalized anxiety disorder; IAPT, Improve access to psychological therapies; ICHOM, International consortium of health outcome measurements; LMIC, Low-to-middle income countries; PROMIS, Patient reported outcome measurement information system; RMSEA, Root mean square error of approximation; SRMR, Standardized root mean square residual.

References

1. Kessler, RC, Angermeyer, M, Anthony, JC, de Graaf, R, Demyttenaere, K, Gasquet, I, et al. Lifetime prevalence and age-of-onset distributions of mental disorders in the World Health Organization's world mental health survey initiative. World Psychiatry. (2007) 6:168–76. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2174588/

PubMed Abstract | Google Scholar

2. Ferrari, A, Somerville, A, Baxter, A, Norman, R, Patten, S, Vos, T, et al. Global variation in the prevalence and incidence of major depressive disorder: a systematic review of the epidemiological literature. Psychol Med. (2013) 43:471–81. doi: 10.1017/S0033291712001511

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Baxter, A, Scott, K, Vos, T, and Whiteford, H. Global prevalence of anxiety disorders: a systematic review and meta-regression. Psychol Med. (2013) 43:897–910. doi: 10.1017/S003329171200147X

CrossRef Full Text | Google Scholar

4. World Health Organization (2019). Global status report on alcohol and health 2018 (9241565632).

Google Scholar

5. Grant, BF, Saha, TD, Ruan, WJ, Goldstein, RB, Chou, SP, Jung, J, et al. Epidemiology of DSM-5 drug use disorder: results from the national epidemiologic survey on alcohol and related conditions–III. JAMA Psychiatry. (2016) 73:39–47. doi: 10.1001/jamapsychiatry.2015.2132

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Bromet, E, Andrade, LH, Hwang, I, Sampson, NA, Alonso, J, de Girolamo, G, et al. Cross-national epidemiology of DSM-IV major depressive episode. BMC Med. (2011) 9:90. doi: 10.1186/1741-7015-9-90

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Patel, V, Xiao, S, Chen, H, Hanna, F, Jotheeswaran, AT, Luo, D, et al. The magnitude of and health system responses to the mental health treatment gap in adults in India and China. Lancet. (2016) 388:3074–84. doi: 10.1016/S0140-6736(16)00160-4

CrossRef Full Text | Google Scholar

8. van't Hof, E, Cuijpers, P, Waheed, W, and Stein, DJ. Psychological treatments for depression and anxiety disorders in low-and middle-income countries: a meta-analysis. Afr J Psychiatry. (2011) 14:200–7. doi: 10.4314/ajpsy.v14i3.2

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Chisholm, D, Sweeny, K, Sheehan, P, Rasmussen, B, Smit, F, Cuijpers, P, et al. Scaling-up treatment of depression and anxiety: a global return on investment analysis. Lancet Psychiatry. (2016) 3:415–24. doi: 10.1016/S2215-0366(16)30024-4

PubMed Abstract | CrossRef Full Text | Google Scholar

10. World Health Organization. Update of the mental health gap action Programme (mhGAP) guidelines for mental, neurological and substance use disorders, 2015. Geneva: World Health Organization (2015).

Google Scholar

11. Jacobson, NS, and Truax, P. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol. (1991) 59:12–9. doi: 10.1037//0022-006x.59.1.12

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Choi, SW, Schalet, B, Cook, KF, and Cella, D. Establishing a common metric for depressive symptoms: linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychol Assess. (2014) 26:513–27. doi: 10.1037/a0035768

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Schalet, BD, Cook, KF, Choi, SW, and Cella, D. Establishing a common metric for self-reported anxiety: linking the MASQ, PANAS, and GAD-7 to PROMIS anxiety. J Anxiety Disord. (2014) 28:88–96. doi: 10.1016/j.janxdis.2013.11.006

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Batterham, PJ, Sunderland, M, Slade, T, Calear, AL, and Carragher, N. Assessing distress in the community: psychometric properties and crosswalk comparison of eight measures of psychological distress. Psychol Med. (2018) 48:1316–24. doi: 10.1017/S0033291717002835

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Jadnanansing, R, Blankers, M, Dwarkasing, R, Etwaroo, K, Lumsden, V, Dekker, J, et al. Prevalence of substance use disorders in an urban and a rural area in Suriname. Trop Med Health. (2021) 49:12. doi: 10.1186/s41182-021-00301-7

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Babor, T., Higgins-Biddle, J., Saunders, J., and Monteiro, M. (2001). The alcohol use disorders identification test (AUDIT) manual: guidelines for use in primary care. Department of Mental Health and Substance Dependence (World Health Organization, Issue.

Google Scholar

17. Peng, C-Z, Wilsnack, RW, Kristjanson, AF, Benson, P, and Wilsnack, SC. Gender differences in the factor structure of the alcohol use disorders identification test in multinational general population surveys. Drug Alcohol Depend. (2012) 124:50–6. doi: 10.1016/j.drugalcdep.2011.12.002

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Radloff, LS. The CES-D scale: a self-report depression scale for research in the general population. Appl Psychol Meas. (1977) 1:385–401. doi: 10.1177/014662167700100306

CrossRef Full Text | Google Scholar

19. Vilagut, G, Forero, CG, Barbaglia, G, and Alonso, J. Screening for depression in the general population with the center for epidemiologic studies depression (CES-D): a systematic review with meta-analysis. PLoS One. (2016) 11:e0155431. doi: 10.1371/journal.pone.0155431

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Spitzer, RL, Kroenke, K, Williams, JB, and Löwe, B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch Intern Med. (2006) 166:1092–7. doi: 10.1001/archinte.166.10.1092

CrossRef Full Text | Google Scholar

21. Clark, DM, Layard, R, Smithies, R, Richards, DA, Suckling, R, and Wright, B. Improving access to psychological therapy: initial evaluation of two UK demonstration sites. Behav Res Ther. (2009) 47:910–20. doi: 10.1016/j.brat.2009.07.010

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Kroenke, K, Spitzer, RL, Williams, JB, Monahan, PO, and Löwe, B. Anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. Ann Intern Med. (2007) 146:317–25. doi: 10.7326/0003-4819-146-5-200703060-00004

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Chambless, DL, Caputo, GC, Bright, P, and Gallager, R. Assessment of fear of fear in agoraphobics: the body sensations questionnaire and the agoraphobic cognitions questionnaire. J Consult Clin Psychol. (1984) 52:1090–7. doi: 10.1037/0022-006X.52.6.1090

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Arrindell, WA. The fear of fear concept: stability, retest artefact and predictive power. Behav Res Ther. (1993) 31:139–48. doi: 10.1016/0005-7967(93)90065-3

PubMed Abstract | CrossRef Full Text | Google Scholar

25. de Beurs, E, Boehnke, J, and Fried, EI. Common measures or common metrics? A plea to harmonize measurement results. Clin Psychol Psychoth. (2022) 29:1755–67. doi: 10.1002/cpp.2742

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Chalmers, RP. mirt: a multidimensional item response theory package for the R environment. J Stat Softw. (2012) 48:1–29. doi: 10.18637/jss.v048.i06

CrossRef Full Text | Google Scholar

27. Fischer, HF, and Rose, M. Scoring depression on a common metric: a comparison of EAP estimation, plausible value imputation, and full Bayesian IRT modeling. Multivar Behav Res. (2019) 54:85–99. doi: 10.1080/00273171.2018.1491381

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Embretson, SE, and Reise, SP. Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates (2013).

Google Scholar

29. Crane, PK, Gibbons, LE, Jolley, L, and van Belle, G. Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Med Care. (2006) 44:S115–23. doi: 10.1097/01.mlr.0000245183.28384.ed

CrossRef Full Text | Google Scholar

30. Choi, SW, Gibbons, LE, and Crane, PK. Lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw. (2011) 39:1–30. doi: 10.18637/jss.v039.i08

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Rosseel, Y. Lavaan: an R package for structural equation modeling and more. Version 0.5–12 (BETA). J Stat Softw. (2012) 48:1–36. doi: 10.18637/jss.v048.i02

CrossRef Full Text | Google Scholar

32. Orlando, M, and Thissen, D. Further investigation of the performance of S - X2: an item fit index for use with dichotomous item response theory models. Appl Psychol Meas. (2003) 27:289–98. doi: 10.1177/0146621603027004004

CrossRef Full Text | Google Scholar

33. Van der Ark, LA. Mokken scale analysis in R. J Stat Softw. (2007) 20:1–19. doi: 10.18637/jss.v020.i11

CrossRef Full Text | Google Scholar

34. Yen, WM. Scaling performance assessments: strategies for managing local item dependence. J Educ Meas. (1993) 30:187–213. doi: 10.1111/j.1745-3984.1993.tb00423.x

CrossRef Full Text | Google Scholar

35. Baty, F, Ritz, C, Charles, S, Brutsche, M, Flandrois, J-P, and Delignette-Muller, M-L. A toolbox for nonlinear regression in R: the package nlstools. J Stat Softw. (2015) 66:1–21. doi: 10.18637/jss.v066.i05

CrossRef Full Text | Google Scholar

36. de Beurs, E, Oudejans, S, and Terluin, B. A common measurement scale for scores from self-report instruments in mental health care: T scores with a normal distribution. Eur J Psychol Assess. (2022). doi: 10.1027/1015-5759/a000740

CrossRef Full Text | Google Scholar

37. Löwe, B, Decker, O, Müller, S, Brähler, E, Schellberg, D, Herzog, W, et al. Validation and standardization of the generalized anxiety disorder screener (GAD-7) in the general population. Med Care. (2008) 46:266–74. doi: 10.1097/MLR.0b013e318160d093

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Parkerson, HA, Thibodeau, MA, Brandt, CP, Zvolensky, MJ, and Asmundson, GJ. Cultural-based biases of the GAD-7. J Anxiety Disord. (2015) 31:38–42. doi: 10.1016/j.janxdis.2015.01.005

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Bouwman, V, Adriaanse, MC, van ‘t Riet, E, Snoek, FJ, Dekker, JM, and Nijpels, G. Depression, anxiety and glucose metabolism in the general Dutch population: the new Hoorn study. PLoS One. (2010) 5:e9971. doi: 10.1371/journal.pone.0009971

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Craske, MG, Rachman, SJ, and Tallman, K. Mobility, cognitions, and panic. J Psychopathol Behav Assess. (1986) 8:199–210. (In File). doi: 10.1007/BF00959832

CrossRef Full Text | Google Scholar

41. de Beurs, E. (1993). The assessment and treatment of panic disorder with agoraphobia. PhD Thesis, University of Amsterdam, Amsterdam, the Netherlands.

Google Scholar

42. Smits, N, Cuijpers, P, and van Straten, A. Applying computerized adaptive testing to the CES-D scale: a simulation study. Psychiatry Res. (2011) 188:147–55. doi: 10.1016/j.psychres.2010.12.001

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Cohen, J. Statistical power analysis for the behavioral sciences. 2nd ed Hillsdale, NJ: Lawrence Erlbaum Associates (1988).

Google Scholar

44. McLean, CP, and Anderson, ER. Brave men and timid women? A review of the gender differences in fear and anxiety. Clin Psychol Rev. (2009) 29:496–505. doi: 10.1016/j.cpr.2009.05.003

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Salk, RH, Hyde, JS, and Abramson, LY. Gender differences in depression in representative national samples: meta-analyses of diagnoses and symptoms. Psychol Bull. (2017) 143:783–822. doi: 10.1037/bul0000102

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Wilsnack, RW, Vogeltanz, ND, Wilsnack, SC, and Harris, TR. Gender differences in alcohol consumption and adverse drinking consequences: cross-cultural patterns. Addiction. (2000) 95:251–65. doi: 10.1046/j.1360-0443.2000.95225112.x

PubMed Abstract | CrossRef Full Text | Google Scholar

47. Keyes, KM, Grant, BF, and Hasin, DS. Evidence for a closing gender gap in alcohol use, abuse, and dependence in the United States population. Drug Alcohol Depend. (2008) 93:21–9. doi: 10.1016/j.drugalcdep.2007.08.017

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Stevenson, B, and Wolfers, J. The paradox of declining female happiness. Am Econ J Econ Pol. (2009) 1:190–225. doi: 10.1257/pol.1.2.190

CrossRef Full Text | Google Scholar

49. Cella, D, Choi, S, Garcia, S, Cook, KF, Rosenbloom, S, Lai, J-S, et al. Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Qual Life Res. (2014) 23:2651–61. doi: 10.1007/s11136-014-0732-6

PubMed Abstract | CrossRef Full Text | Google Scholar

50. de Beurs, E, Carlier, IV, and van Hemert, AM. Approaches to denote treatment outcome: clinical significance and clinical global impression compared. Int J Methods Psychiatr Res. (2019) 28:e1797. doi: 10.1002/mpr.1797

PubMed Abstract | CrossRef Full Text | Google Scholar

51. Kolen, MJ, and Brennan, RL. Test equating, scaling, and linking: Methods and practices. 3rd ed Hillsdale, NJ: Springer Science & Business Media (2014).

Google Scholar

52. Timmerman, ME, Voncken, L, and Albers, CJ. A tutorial on regression-based norming of psychological tests with GAMLSS. Psychol Methods. (2020) 26:357–73. doi: 10.1037/met0000348

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: screening, alcohol use disorder, depression, anxiety, T-scores, norms

Citation: de Beurs E, Jadnanansing R, Etwaroo K, Blankers M, Bipat R, Peen J and Dekker J (2023) Norms and T-scores for screeners of alcohol use, depression and anxiety in the population of Suriname. Front. Psychiatry. 14:1088696. doi: 10.3389/fpsyt.2023.1088696

Received: 01 February 2023; Accepted: 10 April 2023;
Published: 27 April 2023.

Edited by:

Mohsen Khosravi, Zahedan University of Medical Sciences, Iran

Reviewed by:

Anthony L. Vaccarino, Indoc Research, Canada
Zahra Ghiasi, Zahedan University of Medical Sciences, Iran
Isa Multazam Noor, YARSI University, Indonesia

Copyright © 2023 de Beurs, Jadnanansing, Etwaroo, Blankers, Bipat, Peen and Dekker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Edwin de Beurs, edwin.de.beurs@arkin.nl

^†These authors share first authorship

†ORCID: Edwin de Beurs, https://orcid.org/0000-0003-3832-8477

Raj Jadnanansing, https://orcid.org/0000-0002-2137-3653

Kajal Etwaroo, https://orcid.org/0000-0002-7325-2212

Matthijs Blankers, https://orcid.org/0000-0002-8821-3312

Robbert Bipat, https://orcid.org/0000-0001-8711-4737

Jaap Peen, https://orcid.org/0000-0002-4421-8744

Jack Dekker, https://orcid.org/0000-0003-3782-6431

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.