Clinical effectiveness of art therapy: quantitative systematic review

Lesley Uttley; Alison Scope; Matt Stevenson; Andrew Rawdin; Elizabeth Taylor Buck; Anthea Sutton; John Stevens; Eva Kaltenthaler; Kim Dent-Brown; Chris Wood

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Uttley L, Scope A, Stevenson M, et al. Systematic review and economic modelling of the clinical effectiveness and cost-effectiveness of art therapy among people with non-psychotic mental health disorders. Southampton (UK): NIHR Journals Library; 2015 Mar. (Health Technology Assessment, No. 19.18.)

Cover of Systematic review and economic modelling of the clinical effectiveness and cost-effectiveness of art therapy among people with non-psychotic mental health disorders

Systematic review and economic modelling of the clinical effectiveness and cost-effectiveness of art therapy among people with non-psychotic mental health disorders.

Show details

Contents

< Prev Next >

Chapter 2Clinical effectiveness of art therapy: quantitative systematic review

This chapter aims to provide an overview of the evidence examining the clinical effectiveness of art therapy in people with non-psychotic mental health disorders.

Literature search methods

Bibliographic database searching

Comprehensive literature searches were used to inform the quantitative, qualitative and cost-effectiveness reviews. A search strategy was developed to identify reviews, RCTs, economic evaluations, qualitative research and all other study types relating to art therapy. Methodological search filters were applied where appropriate. No other search limitations were used and all databases were searched from inception to present. Searches were conducted from May to July 2013. The full search strategies can be found in Appendix 2.

To ensure that the full breadth of literature for the non-psychotic population was included, it was pragmatic to search for all art therapy studies and then subsequently exclude studies manually (through the sifting process) that were conducted in people with a psychotic disorder or a disorder in which symptoms of psychosis were reported. It is therefore possible for the reviewer to view all potentially relevant records available and manually exclude studies of samples with psychotic disorders. This method of searching through the literature is in contrast to an approach that uses a search strategy listing all possible mental health disorders that are considered to be ‘non-psychotic’ in the search terms. The latter method may not retrieve all relevant studies from populations that are not indexed under the named mental health disorders.

In addition to the range of conditions covered by the population, the evidence from the studies being generated was frequently not a clear-cut diagnosed ‘mental health disorder’ and the populations retrieved were not the clinical populations of common mental health problems that were first anticipated. At this point in the study identification process it would have been easy to exclude any study that did not include patients with a clinically diagnosed mental health disorder. If this approach had been taken, there would have been three studies in the quantitative review. Instead a pragmatic approach was taken by identifying, including and describing the populations that art therapy is being studied in, with reference to targeting mental health symptoms (see Chapter 1, Non-psychotic mental health population: definition).

Databases searched

MEDLINE and MEDLINE In-Process & Other Non-Indexed citations (OvidSP).
EMBASE (OvidSP).
Cochrane Database of Systematic Reviews (The Cochrane Library).
Cochrane Central Register of Controlled Trials (The Cochrane Library).
Database of Abstracts of Review of Effects (The Cochrane Library).
NHS Economic Evaluation Database (The Cochrane Library).
Health Technology Assessment Database (The Cochrane Library).
Science Citation Index (Web of Science via Web of Knowledge).
Social Sciences Citation Index (Web of Science via Web of Knowledge).
CINAHL: Cumulative Index to Nursing and Allied Health Literature (EBSCOhost).
PsycINFO (OvidSP).
AMED: Allied and Complementary Medicine Database (OvidSP).
ASSIA: Applied Social Sciences Index and Abstracts (ProQuest).

Sensitive keyword strategies using free-text and, where available, thesaurus terms using Boolean operators and database-specific syntax were developed to search the electronic databases. Date limits or language restrictions were not used on any database. All resources were searched from inception to May 2013.

Grey literature searching

A number of sources were searched to identify any relevant grey literature. Relevant grey literature or unpublished evidence would include reports and dissertations that report sufficient details of the methods and results of the study to permit quality assessment. Conference proceedings without a corresponding final report (published or unpublished) would not qualify for inclusion, as they are unlikely to contain sufficient information to permit quality assessment and can often be different to results published in the final report.³⁹^,⁴⁰

Sources searched

NHS Evidence (Guidelines): www.evidence.nhs.uk/.
The BAAT: www.baat.org/index.html.
UK Clinical Research Network Portfolio Database: public.ukcrn.org.uk/Search/Portfolio.aspx.
National Research Register Archive: www.nihr.ac.uk/Pages/NRRArchive.aspx.
Current Controlled Trials: www.controlled-trials.com/.
OpenGrey: www.opengrey.eu/.
Google Scholar: scholar.google.co.uk/.
Mind: www.mind.org.uk/.
International Art Therapy Organisation: www.internationalarttherapy.org/.
National Coalition of Arts Therapies Associations: www.nccata.org/.

Additional search methods

A hand search of the International Journal of Art Therapy (formerly Inscape) was conducted. The additional search methods of reference list checking and citation searching of the included studies were utilised. Other complementary search methods were considered such as pearl growing; however, because the search method employed was considered to be very inclusive, such additional methods were unlikely to generate additional relevant records.

Review methods

Screening and eligibility

The operational sifting criteria (eligibility criteria) were defined and verified by two reviewers (LU and AS). Titles and abstracts of all records generated from the searches were scrutinised by one assessor and checked by a second assessor to identify studies for possible inclusion into the quantitative review. All studies identified for inclusion at abstract stage were obtained in full text for more detailed appraisal. Non-English studies were translated and included if relevant. For conference abstracts or clinical trial records without study data, authors were contacted via e-mail; however, no additional data were retrieved by contacting study authors. There was no exclusion on the basis of quality. If closer assessment of studies at full text indicated that eligible studies were not RCTs, then the studies were excluded. Agreement on inclusion, for 20% of the total search results (n = 2015), was calculated at title/abstract sift demonstrating 0.93 agreement using the kappa statistic. If there was uncertainty regarding the inclusion of a study, the reviewers sought the opinion of the team members with the relevant clinical, methodological or subject expertise to guide the decision.

Accumulation of results

All references were accumulated in a database using Reference Manager Version 12 (Thomson Reuters, Philadelphia, PA, USA), enabling studies to be retrieved in categories by keyword searches and duplicates to be removed.

Study appraisal

Two reviewers (LU and AS) performed data extraction independently for all included papers and discrepancies were resolved by discussion between reviewers. When necessary, authors of the studies were contacted for further information. Data were input into a data extraction template using Microsoft Excel (Microsoft Corporation, Redmond, WA, USA), which was designed for the purpose of this review and verified by two reviewers. Information related to study population, sample size, intervention, comparators, potential biases in the conduct of the trial, outcomes including adverse events, follow-up and methods of statistical analysis was abstracted from the published papers directly into the electronic data extraction spreadsheet.

The evidence generated from the comprehensive searches highlighted that the majority of research in art therapy is conducted by or with art therapists. This indicates potential researcher allegiance towards the intervention in that art therapists are likely to have a vested interest in the output of the study. For this reason it was deemed important to focus on the highest quality evidence available from the study literature. Trials that were non-randomised (i.e. in which the researcher was able to select and allocate participants to treatment arms) were considered to be too low in methodological rigour to be included in this review. The consequence of including data from non-randomised studies into the review is that the resulting data are biased and therefore not robust or sufficient to inform and contribute to the evidence base.⁴¹^,⁴² The inclusion and exclusion criteria for the quantitative review are shown in Figure 2.

FIGURE 2

Eligibility criteria for the quantitative review.

Setting

Studies could be conducted in any setting, including primary, secondary, community based or inpatient.

Sessions

Study selection was not limited by the number of sessions, and studies that provided the intervention in a single session were included.

Timing of outcome assessment

Post-treatment outcomes and outcomes at reported follow-up points were extracted and summarised when reported.

Quality assessment strategy

Quality assessment of included RCTs was performed for all studies independently by two reviewers using quality assessment criteria adapted from the Cochrane risk of bias,⁴⁴ Centre for Reviews and Dissemination (CRD) guidance⁴⁵ and Critical Appraisal Skills Programme (CASP)⁴⁶ checklists to develop a modified tool for the purpose of this review. The modified tool was developed to incorporate relevant elements across several tools to allow comprehensive and relevant quality assessment for the included trials. Judgements and corresponding reasons for judgements for each quality criterion for all studies were stated explicitly and recorded. Risk of bias was assessed to be low, high or unclear. Where insufficient details were reported to make a judgement, risk of bias was stated to be unclear and authors were not contacted for further details. Discrepancies in judgements were resolved by discussion between the two reviewers.

Results of the quantitative review

The total number of published articles yielded from electronic database searches after duplicates were removed was 10,073 (see Figure 3). An additional 197 records were identified from supplementary searches, resulting in a total of 10,270 records for screening. Of these, 10,221 records were excluded at title/abstract screening. Common reasons for exclusion from the review can be seen in Table 1. A full list of the studies excluded from the quantitative review at full text stage (with reasons for exclusion) can be found in Appendix 3.

FIGURE 3

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram of studies included in the quantitative review.

TABLE 1

Common reasons for exclusion from the review

The grey literature searches yielded very few potentially relevant records that were not generated by the electronic searches. One record appeared highly relevant to the research question and related to a clinical trial record of and RCT of art therapy in personality disorder (CREATe) for which the status was ‘ongoing’. However, e-mail contact with the primary investigator of this trial confirmed that the trial had been terminated because of poor recruitment.

Included studies: quantitative review

Fifteen RCTs were identified for inclusion into the review which were reported in 18 sources (see Table 2). For clarity in this comparison, where a study with multiple sources is discussed only one of the sources has been noted.

TABLE 2

Description of 15 included RCTs

Ten out of the 15 included studies were conducted in the USA, while only one study was conducted in the UK (see Tables 2 and 3). Eleven of the studies were conducted in adults (who are the primary focus of this review) and four were conducted in children. All trials had small final sample sizes with the number of participants reported to be included in each study ranging between 18 and 111. The mean sample size was 52.

TABLE 3

Comparators across the 15 included studies

Three studies are of patients from the target population of people with non-psychotic mental disorders.⁴⁷^–⁴⁹ Of these three studies, only one was conducted in adults.⁴⁷

In the remaining 12 studies, the study population comprised individuals without a formal mental health diagnosis.⁴⁹^–⁵⁹^,⁶¹^,⁶² The populations in these studies are, therefore, mainly people with long-term medical conditions which are not reported to be accompanied by a mental health diagnosis; however, outcomes targeted in these studies were mental health symptoms.

The total number of patients in the included studies is 777. Nine studies compared art therapy with an active control group and six studies compared art therapy with a wait-list control or treatment as usual.

Two studies were reported to be conducted in an inpatient setting⁴⁸^,⁴⁹ and one study was conducted in prison.⁵⁹ The majority of studies were conducted in community/outpatient setting, although the precise setting for conducting the intervention was not reported in six studies.⁵⁰^,⁵²^,⁵⁴^–⁵⁶^,⁶¹

Brief descriptions of the art therapy interventions are provided in Tables 4 and 5.

TABLE 4

Description of intervention and control in studies with active control

TABLE 5

Description of intervention in studies with non-active control

Study duration ranged between the 15 studies from 1 session to 40 sessions, with a mean number of nine sessions (see Tables 4 and 5). Most studies with an active control group were of ‘group’ art therapy. One study which was a ‘brief’ intervention consisting of one individual session per participant.⁵⁶ Two studies did not state explicitly if sessions were in a group or individual.⁴⁷^,⁵³ Three studies with no active control were group art therapy⁵⁸^,⁵⁹^,⁶¹ and three studies were individual art therapy.⁴⁹^,⁵⁵^,⁶²

The symptoms or outcome domains under investigation and associated outcome measures are reported in Table 6.

TABLE 6

Outcome domains under investigation in the 15 included RCTs

Data synthesis

Heterogeneity of the included studies

The study populations are heterogeneous (Figure 4), highlighting the wide application of art therapy in this small number of included RCTs but also demonstrating the difficulty in obtaining a pooled estimate of treatment effect. In this respect the clinical profile of patients can be regarded as a potential treatment effect modifier.

FIGURE 4

Patient clinical profiles in the 15 included RCTs.

The control groups across the included studies are heterogeneous (Figure 5); therefore, there may be different estimates of treatment effects depending on what art therapy is compared against. Creating a network meta-analysis, which would incorporate all relevant evidence for all the comparators, for all non-psychotic mental health disorders, would be beyond the remit for this research project.

FIGURE 5

Comparator arms in the 15 included RCTs.

In addition, despite common mental health symptoms being investigated across the included RCTs, the majority of studies were using different measurement scales to assess these outcomes (Table 7). Therefore, as there are insufficient comparable data on outcome measure across studies, it is not possible to perform a formal pooled analysis.

TABLE 7

Instruments used in the 15 included RCTs

Potential treatment effect modifiers in the included studies

As well as the patient’s clinical profile, several other treatment effect modifiers can be identified from the included studies.

Experience/qualification of the art therapist

Twelve of the 15 included studies stated that the art therapy was delivered by one or more art therapists. One study was reported in three sources to use a ‘trained’ art therapist.⁶²^–⁶⁴ One study reported the art therapist as ‘licensed’.⁵⁶ Two studies reported using a ‘qualified’ art therapist.⁴⁸^,⁵⁷ Two studies reported using a ‘certified’ art therapist.⁵⁰^,⁵³ One study was reported in two sources as using a ‘registered’ art therapist.⁶⁰^,⁶¹ One study reported using ‘experienced art psychotherapists’.⁴⁷ Four studies simply stated ‘art therapist’ without reference to accreditation.⁴⁹^,⁵²^,⁵⁸^,⁵⁹ One study stated that the sessions were run by one artist and two speech therapists.⁵¹ One study stated that the sessions were run by two mental health counsellors.⁵⁵ One study did not state whether or not an art therapist was involved.⁵⁴ While there was considerable variability in the reporting of the accreditation of the therapist, most studies were conducted by a person who was considered to be qualified as an art therapist.

Individual versus group art therapy

The majority of RCTs are of group art therapy with only 4 of the 15 RCTs examining individual art therapy.⁴⁹^,⁵⁵^,⁵⁶^,⁶²

Age

Eleven RCTs are of adults and four RCTs are of children or adolescents.⁴⁸^,⁴⁹^,⁵⁰^,⁵⁸

Gender

Five RCTs involved only women,⁴⁷^,⁵⁴^,⁵⁵^,⁶¹^,⁶² and one RCT only men.⁵⁹ In the remaining nine RCTs the subjects were of mixed gender.

Pre-existing physical condition

In nine studies patients had pre-existing physical conditions.⁵⁰^,⁵¹^,⁵⁴^–⁵⁸^,⁶¹^,⁶² The remaining six studies involved people who were depressed,⁴⁷^,⁵⁹ people with post-traumatic stress disorder (PTSD)⁴⁸^,⁴⁹ or older people.⁵²^,⁵³

Other

Other potential treatment effect modifiers which are not fully explored in the included RCTs include duration of disease (mental or physical), underlying reason for mental health disorder and patient preference for art therapy.

Owing to the degree of clinical heterogeneity across the studies and the lack of comparable data on outcome measures, meta-analysis was not appropriate. Therefore, the synthesis of data is limited to a narrative review to analyse the robustness of the data, which includes trial summaries as well as tabulation of results.

Study summaries

This section provides short overviews of each study with reference to statistically significant differences between groups that were reported in each of the studies.

Beebe et al. 2010⁵⁸

This was a RCT in children (n = 22) with asthma of art therapy versus wait-list control. Sessions lasting 60 minutes were provided once a week for seven weeks. Outcomes were measured at baseline, immediately following completion of therapy and 6 months after the final session. Targeted variables were quality of life (QoL) and behavioural and emotional adaptation. Outcome measurement tools were the Paediatric QoL asthma module and Beck Youth Inventories. Pre- and post-test scores were compared between groups using analysis of variance (ANOVA) and Dunnett’s test. Compared with baseline scores, the intervention group showed a significant reduction in 4 out of 10 QoL items at 7 weeks and in 2 out of 10 QoL items at 6 months. Significant improvement relative to the control group was found in two out of five items of the Beck Youth Inventory at 7 weeks and in one out of five items at 6 months.

Broome et al. 2001⁵⁰

This was a three-arm RCT in children and adolescents (n = 97) with sickle cell disease of art therapy versus CBT (relaxation for pain) or attention control (fun activities). Group sessions were provided over 4 weeks. Outcomes were measured at baseline and at 4 weeks and 12 months. The targeted variable was coping and the authors hypothesised that coping strategies would increase after attending a self-care intervention. Outcome measures were the Schoolagers’ Coping Strategies Inventory and Adolescent Coping Orientation for Problem Experiences scores and numbers of emergency room visits, clinic visits and hospital admissions. The number of coping strategies used was analysed at three time points using Pearson’s correlations, independent t-tests and ANOVA. Coping strategies increased in children and adolescents in all three groups, but data regarding the difference between the intervention and control groups were not reported.

Chapman et al. 2001⁴⁹

This RCT of brief art therapy versus treatment as usual was carried out in children (n = 85) hospitalised with PTSD. A 1-hour individual session was provided but the number of sessions was not reported. Outcomes were measured at baseline and at 1 week, 1 month, and 6 and 12 months (in children who were still symptomatic). The targeted symptom was PTSD. The outcome measurement tool was Children’s Post Traumatic Stress Disorder Index (PTSD-I). The method of statistical analysis was not described. No significant differences were found between groups, but a non-significant trend towards greater reduction in PTSD-I scores was observed in the intervention group relative to the control group.

Gussak 2007⁵⁹

This was a RCT in incarcerated adult males (n = 44) of art therapy versus no treatment. Eight weekly group sessions were provided. Outcomes were measured pre- and post-test (exact time points not reported). The targeted symptom was depression. The outcome measure was the Beck Depression Inventory-Short Form (BDI-II). The change in BDI-II scores from pre-test to post test was calculated and differences between groups analysed using independent-samples t-tests. Depression was significantly lower in the intervention group than in the control group post test.

Hattori et al. 2011⁵¹

This was a RCT in Alzheimer disease (n = 39) of art therapy versus a ‘simple calculation’ control group. Twelve 45-minute weekly sessions were provided (individual/group not reported). Outcomes were measured at baseline and at 12 weeks. Targeted variables were mood, vitality, behavioural impairment, QoL, activities of daily living and cognitive function. Outcome measures were the Mini Mental State Examination Score (MMSE), the Wechsler Memory Scale revised; the Geriatric Depression Scale (GDS); the Apathy Scale (Japanese version); Short Form questionnaire-8 items (SF-8) – Physical (PCS-8) and Mental (MCS-8) components; the Barthel Index; the Dementia Behaviour Disturbance Scale; and the Zarit Caregiver Burden Interview. Outcomes were measured at baseline and 12 weeks. The percentage of responders who showed a 10% or greater improvement relative to baseline score before the intervention was compared between groups using a chi-squared test. A significant improvement in the intervention group was seen in MCS-8 subscale of the SF-8 and the Apathy Scale. The control group showed a significant improvement in MMSE relative to the intervention group. No significant differences between groups in other items were reported.

Kim 2013⁵²

This RCT in older adults (n = 50) compared art therapy with regular programme activities. Between 8 and 12 sessions lasting 60–75 minutes were provided over 4 weeks. Targeted variables were positive/negative affect, state–trait anxiety and self-esteem. Outcomes were measured using the Positive & Negative Affect Schedule, the State–Trait Anxiety Inventory (STAI) and the Rosenberg Self-Esteem Scale. Time points for measurement were not reported (assumed 4 weeks). Independent group t-tests were performed to compare pre- and post-test scores between groups. Significant improvements in the intervention were seen in all three outcomes compared with the control group.

Lyshak-Stelzer et al. 2007⁴⁸

This RCT in adolescents (n = 29) with PTSD compared art therapy with arts and crafts activities. Sixteen weekly group sessions were provided. The targeted symptom was PTSD. Outcome measurement tools were the University of California, Los Angeles (UCLA) PTSD Reaction Index (Diagnostic and Statistical Manual of Mental Disorders – Fourth Edition, Child Version) (primary measure) and milieu behavioural measures (e.g. use of restraints). Measurement time points were not reported, but data at two years were provided. Pre- and post-test scores were compared between groups using repeated-measures ANOVA. The intervention was significantly better than control at reducing PTSD symptoms, according to the UCLA PTSD Reaction Index.

McCaffrey et al. 2011⁵³

This was a RCT in older adults (n = 39) of art therapy versus garden walking (individual and group). Twelve 60-minute sessions (group/individual not reported) were provided over 6 weeks. The targeted symptom was depression. The outcome measurement tool was the GDS. Pre- and post-test scores were compared between groups using repeated-measures ANOVA. Measurement was at baseline and 6 weeks. Depression significantly improved from baseline in all three groups with no significant differences between groups.

Monti and Peterson 2004;⁶⁰ Monti et al. 2006⁶¹

This RCT in women with cancer (n = 111) compared mindfulness-based art therapy with wait-list control. The trial was sized to have 80% power to detect a standardised effect size of 0.62. Eight 150-minute group sessions were provided over 8 weeks. Targeted variables were distress, depression, anxiety and QoL. Outcome measurement tools were the Symptom Checklist-90-Revised (SCL-90-R), the Global Severity Index (GSI) and the Short Form questionnaire-36 items (SF-36). Measurement was at baseline and at 8 weeks and 16 weeks. Pre-and post-test measures were compared between groups using mixed-effects repeated-measures ANOVA. A significant decrease in symptoms of distress and highly significant improvements in some areas of the QoL scale were observed in the intervention group compared with the control group.

Monti et al. 2012⁵⁴

This RCT of women with breast cancer (n = 18) compared mindfulness-based art therapy with educational support (control group). Eight 150-minute weekly group sessions were provided. The targeted symptom was anxiety but the authors were interested in whether or not cerebral blood flow (CBF) correlated with experimental condition. The primary outcome measurement was functional magnetic resonance imaging (fMRI) CBF and the correlation with anxiety using SCL-90-R. Measurement was at baseline and within 2 weeks of the end of the 8-week programme. The method of statistical analysis was not described and the effectiveness of the intervention was not the primary outcome. Anxiety was reduced in the intervention group but not in the control group. CBF on fMRI changed in certain brain areas in the art therapy group only. It should be noted that patients with a confirmed diagnosis of a psychiatric disorder were excluded from this study.

Puig et al. 2006⁵⁵

This was a RCT in women with breast cancer (n = 39) of art therapy versus delayed treatment. Four 60-minute weekly sessions were provided. Targeted symptoms were anger, confusion, depression, fatigue, anxiety, activity and coping. The outcomes, the Profile of Mood States and the Emotional Approach Coping Scale (EACS) scores, were measured before and 2 weeks after the intervention. Pre- and post-test scores were compared between groups using ANOVA. The intervention group showed significant improvements in the anger, confusion, depression and anxiety mood states but fatigue and activity were not significantly different between the groups. In the intervention group, EACS coping scores increased, but were not significantly different from those in the delayed treatment control group.

Rao et al. 2009⁵⁶

In this RCT in adults with HIV/AIDS (n = 79), the intervention group received brief art therapy while the controls watched a video tape on the uses of art therapy. Only one 60-minute session of individual art therapy was provided. Targeted symptoms were anxiety and physical symptoms, including pain. The outcome measures used were Edmonton Symptom Assessment Scale (ESAS) scores (primary outcome) and STAI scores. Pre-and post-test scores were compared between groups using analysis of covariance (ANCOVA) and adjusted for age, gender and ethnicity. Measurements were recorded before and immediately after the intervention or control session. The intervention group experienced significant improvements in physical symptoms (ESAS) compared with the control group, but anxiety was not significantly different between the groups.

Rusted et al. 2006⁵⁷

In this RCT in adults with dementia (n = 45), art therapy was compared with an activity group control. Forty 60-minute weekly group sessions were provided. Targeted symptoms were depression, mood, sociability and physical involvement. Outcome measures were the Cornell Scale for Depression in Dementia the Multi Observational Scale for the Elderly, MMSE, The Rivermead Behavioural Memory Test, Tests of Everyday Attention and the Benton Fluency Task. Measurements were recorded at baseline, 10 weeks, 20 weeks, 40 weeks and at follow-up at 44 and 56 weeks. Pre- and post-test scores were compared between groups using ANOVA with time of assessment as repeated measures. At 40 weeks, the intervention group was significantly more depressed than the control group, but this effect was reduced at follow-up. However, groups were not comparable at baseline, as the art therapy group were more depressed at the beginning of the study than the control group.

Thyme et al. 2007⁴⁷

This was a RCT in depressed female adults (n = 39) of psychodynamic art therapy versus verbal dynamic psychotherapy. Ten 60-minute weekly sessions (individual/group not reported) were provided. Targeted symptoms were stress reactions after a range of traumatic events, mental health symptoms and depression. Outcome measurements were Impact of Event Scale, Symptom-Checklist-90 (SCL-90), Beck Depression Inventory (BDI) and Hamilton Rating Scale of Depression scores. Measurements were recorded at baseline, at 10 weeks and at a 3-month follow-up. All patients improved from baseline on all scales (p < 0.001). There were no significant differences between groups so art therapy was not significantly different to the comparator at either time point.

Thyme et al. 2009;⁶² Svensk et al. 2009;⁶³ Oster et al. 2006⁶⁴

This RCT in women with breast cancer (n = 41) compared art therapy with treatment as usual as a control. Five 60-minute weekly individual session were provided. Targeted symptoms were depression, anxiety, somatic, general symptoms, QoL and coping methods. Outcome measure tools were the Structural Analysis of Social Behavior, the GSI, the SCL-90, the World Health Organization (WHO) QoL instrument – Swedish version, the European Organization for Research and Treatment of Cancer (EORTC) QoL Questionnaire-BR23 and the Coping Resources Inventory (CRI). Measurements were recorded at baseline and at 2 months and 6 months. The intervention significantly improved depressive, anxiety, somatic and general symptoms compared with the control. Pre- and post-test scores were compared between groups using t-tests, ANOVA and linear regression. On the WHOQoL, scores on the overall, general health and environmental domains at 6 months were significantly higher in the intervention group than in the control group. There were no significant differences between groups on the EORTC. In the intervention group, the score on only the ‘social’ dimension of the CRI was increased relative to the control group.

Results

Findings of the included studies

The directions of statistically significant results from the 15 included RCTs are summarised in Table 8.

TABLE 8

Summary of the direction of findings from the 15 included studies

As can be seen in Table 8, in 14 of the 15 included studies there were improvements from baseline in some outcomes in the art therapy groups. However, both the intervention and the control groups improved from baseline in four studies, with no significant difference between the groups.⁴⁷^,⁴⁹^,⁵⁰^,⁵³ The control groups across these four studies were verbal psychodynamic psychotherapy,⁴⁷ treatment as usual,⁴⁹ CBT⁵⁰ and garden walking,⁵³ and verbal psychodynamic psychotherapy, respectively.

In eight studies, art therapy was significantly better than the control group for some but not all outcome measures. Table 9 shows the results according to the mean change from baseline between groups in these eight studies.

TABLE 9

Nine included studies with statistically significant findings in the art therapy group in some but not all outcome measures

In one study,⁵² all outcomes were significantly better in the art therapy intervention group than in the control group. Table 10 shows the results from the Kim⁵² study.

TABLE 10

One included study with statistically positive findings for all outcomes in the art therapy group

In one study⁵⁷ of a sample of people with dementia, outcomes were worse for the art therapy group than for the control group, which was an activity control group. An unusual pattern of results is presented, including a significant increase in anxious/depressed mood (p < 0.01) at 40 weeks which was not present at the 10- or 20-week time points and dissipated by 44 and 56 weeks. The authors discuss several reasons for this result including the high level of attrition; the reliance on observer ratings in the frail and elderly sample (and subsequent potential impact of observer bias); the increased depression as a response to the sessions ending; and the possibility that art therapy was contraindicated in this sample.

Narrative subgroup analysis of studies by mental health outcome domains

Table 11 presents the results for effectiveness of art therapy across relevant mental health outcome domains.

TABLE 11

Effectiveness of art therapy across mental health outcome domains

Depression

Among the nine studies examining depression,⁴⁷^,⁵¹^,⁵³^,⁵⁵^,⁵⁷^–⁵⁹^,⁶¹^,⁶² art therapy resulted in significant reduction in depression in six studies.⁴⁷^,⁵³^,⁵⁵^,⁵⁹^,⁶¹^,⁶² In four of these six studies,⁵⁵^,⁵⁹^,⁶¹^,⁶² art therapy was significantly more effective than the control. Data relating to significant differences are reported in Table 9.

Anxiety

Among the seven studies examining anxiety,⁵²^,⁵⁴^–⁵⁶^,⁵⁸^,⁶¹^,⁶² art therapy resulted in significant reduction of anxiety in six studies.⁵²^,⁵⁴^,⁵⁵^,⁵⁸^,⁶¹^,⁶² In these six studies, art therapy was significantly more effective than the control. Data relating to significant differences are reported in Tables 8 and 9.

Mood

Among the four studies examining mood or affect,⁵¹^,⁵²^,⁵⁵^,⁵⁷ art therapy resulted in significant positive improvements to mood in three studies.⁵¹^,⁵²^,⁵⁵ In these three studies, art therapy was significantly more effective than the control. Data relating to significant differences are reported in Tables 8 and 9.

Trauma

Among the three studies examining trauma,⁴⁷^–⁴⁹ art therapy resulted in significant reduction of symptoms of trauma in all studies. While trauma improved from baseline, there was no significant difference between the art therapy and control groups in any of the three studies.

Distress

Among the three studies examining distress,⁴⁷^,⁶¹^,⁶² art therapy resulted in significant reduction of distress in all studies. In two studies,⁶¹^,⁶² art therapy was significantly more effective than the control group. Data relating to significant differences are reported in Table 9.

Quality of life

In the four studies examining QoL,⁵¹^,⁵⁸^,⁶¹^,⁶² art therapy resulted in significant improvements to some but not all components of the QoL measures in all studies. In all studies, art therapy was significantly more effective than the control. Data relating to significant differences are reported in Table 9.

Coping

Among the three studies examining coping,⁵⁰^,⁵⁵^,⁶² art therapy resulted in significant improvements to coping resources in all studies. In one study,⁶² art therapy was significantly more effective than the control. In another study, there was no difference between groups.⁵⁵ In the third study, significant differences between the art therapy and control groups were not reported.⁵⁰ Data relating to significant differences are reported in Table 9.

Cognition

In the one study examining cognition,⁵¹ the control group (simple calculations) exhibited significant improvements in cognitive function relative to the art therapy group. Data relating to significant differences are reported in Table 9.

Self-esteem

In the one study examining self-esteem,⁵² art therapy resulted in significant improvements in self-esteem relative to the control group. Data relating to significant differences are reported in Tables 9 and 10.

Adverse events

Adverse events were not reported in any of the included RCTs. However, three studies reported outcomes that may be indirectly related to the safety of art therapy. The Lyshak-Stelzer et al.⁴⁸ study reported no significant differences between groups in the number of incidents, seclusions, restraints or ‘PRN [pro re nata, as needed] orders’. The Broome et al.⁵⁰ study reported a decrease in emergency room visits, clinic visits and hospital admissions over time in both the art therapy and control groups. In addition, the Beebe et al.⁵⁸ study reported equal asthma exacerbation numbers in each group but these occurred after the trial has finished.

The lack of adverse event data in the majority of included studies is not necessarily evidence that there were no adverse events in the included trials. It may indicate only that adverse events were not recorded. Potential harms and negative effects of art therapy are further explored in the qualitative review (see Chapter 3).

Quality assessment: strength of the evidence

Table 12 illustrates the types of study designs and the number of studies included into the quantitative and qualitative reviews.

TABLE 12

Study designs and their inclusion into the review

Critical appraisal of the potential sources of bias in the included studies

Method of recruitment

Participants were typically convenience samples from existing clinical patient groups. Few details were provided on the inclusion/exclusion criteria of the patients in the studies, as can be seen from Table 13.

TABLE 13

Method of participant recruitment in the 15 included RCTs

Allocation bias: Method of randomisation

Table 14 shows the descriptions of randomisation from the included RCTs. Randomisation usually refers to the random assignment of participants to two or more groups. Randomisation was not described in seven studies.⁴⁸^–⁵⁰^,⁵⁴^,⁵⁵^,⁵⁸^,⁵⁹ This information could simply be missing from the published journal paper and, if benefit of the doubt were applied, it could be assumed that proper randomisation may have been done but not reported. This would represent an unclear risk of bias. However, it could also be assumed that proper randomisation did not take place and the method of selecting participants into the studies was flawed. This would represent a high risk of bias. Therefore, there is an unclear/high risk that randomisation was not adequately performed in these six studies.

TABLE 14

Description of randomisation from the included RCTs

Allocation bias: allocation concealment

In order to ensure that the sequence of treatment allocation was concealed, a robust method of allocation to the study arms should be undertaken and documented. Allocation concealment was not reported in any of the included studies. Lack of allocation concealment can destroy the purpose of randomisation, as it can permit selective assignment to the study arms.

Appropriate randomisation for allocation to study arms includes undertaking ‘simple’ randomisation (e.g. tossing a coin), which avoids introducing excessive stratification to prevent imbalanced groups, and ‘distance’ randomisation so that researchers are unable to influence allocation (e.g. a central randomisation service which notes basic patient details and issues a treatment allocation). Several of the eight randomisation methods described are likely to be open to allocation bias either because they did not use distance randomisation or because the reports do not provide enough details about what measures were taken to ensure that allocation was truly concealed to the investigators. For example, the Hattori et al.⁵¹ study describes stratification by three variables. Stratifying by more than one variable can be problematic, and stratifying by more than two variables is not advisable.⁶⁵ In addition, the Kim⁵² study does not clearly describe how randomisation was undertaken. The sealed envelope technique employed in the McCaffrey et al.⁵³ study is intended to ensure that equal numbers receive the intervention and the control but is vulnerable to subterfuge. Few of the included RCTs reported adequate details of methods of randomisation and, consequently, these studies, as reported, had an unclear risk of allocation bias.

Performance bias: blinding

Blinding of participants was not conducted in any of the included RCTs. Blinding of participants to their experimental condition is understandably unfeasible in trials of psychological therapy as opposed to pharmacological interventions. Therefore, while lack of blinding across the included trials means that the trials are at risk of performance bias, the trials cannot be deemed to be of poor quality on this basis.

Performance bias: baseline comparability

Groups were reported to be comparable at baseline in 7 out of the 15 studies (Table 15).⁴⁸^,⁵¹^–⁵⁴^,⁵⁶^,⁶² (Baseline comparability was unclear or not reported and therefore was unable to be assessed in five studies.⁴⁷^,⁴⁹^,⁵⁰^,⁵⁵^,⁵⁸) In three studies,⁵⁷^,⁵⁹^,⁶¹ patients in the art therapy group appeared to have more severe illness at baseline. These differences could reflect a potential allocation bias resulting from flawed randomisation procedures in the studies.

TABLE 15

Baseline comparability between intervention and control groups in the included 15 RCTs

Performance bias: groups treated equally

As blinding was not possible, all studies are at risk of performance bias. In the case of the six studies⁴⁹^,⁵⁵^,⁵⁸^,⁵⁹^,⁶¹^,⁶² that had wait-list/treatment as usual controls rather than an active comparator group, it can be argued that the groups were not treated equally, as the control groups were not given the time and attention that an active control group would receive. Therefore, the risk of performance bias in the art therapy group is higher in these six studies.

Reporting bias: selective outcome reporting

No studies appeared to have collected data on outcomes that were not reported in the results.

Reporting bias: incomplete outcome data

In three studies,⁴⁸^,⁵⁴^,⁵⁷ outcome data were incomplete, indicating a high risk of reporting bias. The reasons for this were: data on 20% completers only (80% of participants withdrew or were excluded);⁴⁸ actual data not provided (only p-values reported);⁵⁴ and group numbers not provided at any time point.⁵⁷ In four studies the risk of reporting bias was unclear because incomplete outcome data were reported.⁴⁹^,⁵⁰^,⁵⁸^,⁵⁹

Detection bias

Blinding of clinical outcome assessment was reported to be conducted in only one study.⁵⁸ Therefore, 14 out of the 15 included RCTs are at unclear to high risk of detection bias, as assessors may have influenced the recording of clinical outcomes.

Researcher allegiance

In the Kim⁵² study there was only one author, and the two researchers are reported to be art therapists. The author is also a senior art therapist. The Gussak 2007⁵⁹ study also has only one author, who is a professor of art therapy. Trials that are published by one author are unlikely to have been conducted as collaborative projects adhering to standards of good clinical practice. The risk of researcher allegiance in these studies is, therefore, high.

The McCaffrey et al. 2011⁵³ study was funded by the owners of the gardens that were the basis of the comparator. The gardens are profit-making, and participants who completed the study were given 1 year’s free membership. The risk of researcher allegiance for the control group in this study, can, therefore, be considered to be high.

As can be seen from Table 16, all studies were prone to many instances of unclear risk of bias. Some studies were prone to several instances of high risk of bias. In the context of this review, with the exception of blinding participants, all the risk of bias domains are important to be able to establish internal validity of these trials. Currently the only domain that is at low risk of bias is selective outcome reporting. Owing to the risks of bias highlighted by the critical appraisal of these studies, it can be concluded that the included RCTs are generally of low quality.

TABLE 16

Summary of risk of bias (high, low or unclear) in the 15 included quantitative studies

Critical appraisal of other potential sources of confounding

Attrition

Withdrawals and exclusions are reported in Table 17.

TABLE 17

Withdrawals from the study across the included RCTs

As can be seen from Table 17, there were only four studies in which all participants completed the trial.⁵²^,⁵⁴^,⁵⁵^,⁵⁸ While several studies reported substantial numbers of dropouts, only one study reported to be sized with reference to effect size.⁶¹ Considering that the sample sizes in the remaining 14 RCTs are small and not sufficiently powered to account for attrition, these dropouts have a significant impact on the reliability of these RCTs. For example, in the Rusted et al.⁵⁷ study, attrition was 53.3%, meaning that the final data are reported for 9 versus 12 people in the art therapy and activity control groups, respectively. This small number of completers calls into question the reliability of this study’s results.

Only 5 of the 11 studies in which dropouts occurred reported the breakdown of withdrawal between groups. Two studies⁵⁰^,⁵⁹ do not report the reasons for withdrawal in the dropouts that occurred. In addition, attrition was not handled appropriately in the included RCTs as imputation for missing data were generally not reported or were reported to be not conducted except in one study.⁶² The risk of attrition bias in the 11 studies where dropouts occurred is, therefore, unclear.

Concomitant treatment

Co-therapy or concomitant medication was not reported in eight trials.⁴⁹^–⁵²^,⁵⁵^–⁵⁸ In a further two studies,⁵³^,⁶¹ participants were eligible to take part if in receipt of mental health treatment but the actual data for concomitant therapy (overall or between groups) are not reported.

In the Gussak⁵⁹ study, 93% (n = 25/27) of participants in the intervention group were taking medication for a mental illness, compared with 27% (n = NR) in the control group. In the Thyme et al.⁴⁷ study, it was reported that psychopharmacological treatment was an exclusion criterion. It is subsequently stated that ‘in the [art therapy] group, one participant were [sic] prescribed antidepressants during therapy (n = 1) and one between termination of therapy and the 3-month follow-up (n = 1), and in the [verbal therapy] group three during therapy (n = 1) [sic] and two after (n = 2). Two participants in VT accepted Body Awareness as an additional treatment during psychotherapy.’⁴⁷

In the Thyme et al. 2009⁶² study the usage of antidepressants was self-reported, and therefore this information may be incomplete. In the Chapman et al.⁴⁹ study, ‘treatment as usual’ hospital care was defined as the normal and usual course of paediatric care including Child Life services, art therapy, and social work and psychiatric consultations. While only the Monti et al. 2012⁵⁴ study reports that use of psychotropic medication was an exclusion criterion, there is generally an unclear/high risk of confounding as a result permitted additional treatment across the included studies.

Treatment fidelity

Sufficient measures to ensure treatment fidelity would include monitoring the therapy sessions through audio or video tapes to allow independent checking. No such measures to ensure that the intervention was being delivered consistently were reported in any of the studies. However, one study⁵⁸ does provide an appendix of the content of each session. In addition, one study⁶¹ provides the art therapy programme details in the first of the two resulting publications.⁶⁰ Most studies provided brief synopses of the intervention programme and content of the sessions.⁴⁸^,⁵⁰^,⁵²^,⁵⁴^–⁵⁶^,⁶² However, some studies provided scant details of what took place in the sessions.⁴⁷^,⁴⁹^,⁵¹^,⁵³^,⁵⁷^,⁶⁶ Moreover, Chapman et al.⁴⁹ do not even state how many sessions were provided. Therefore, the included RCTs have unclear risk of poor treatment fidelity.

The risk of bias assessment and the potential areas of confounding including attrition, concomitant treatment and treatment fidelity illustrate that the included trials are generally of low quality and, therefore, the results of the 15 RCTs that are included in the quantitative review should be interpreted with caution. Three studies⁴⁷^,⁵¹^,⁵⁶ can be considered as being of slightly better quality because there are no instances of high risk of bias (other than blinding, which is a common hurdle in trials of psychological therapy) and at low risk of bias on at least four domains.

Discussion

Discussion of the quantitative review

The aim of the quantitative systematic review was to assess the evidence of clinical effectiveness of art therapy compared with control for treating non-psychotic mental health disorders. The limited available evidence showed that patients receiving art therapy had significant positive improvements in 14 out of 15 RCTs. In 10 of these studies, art therapy resulted in significantly more improved outcomes than the control, while in four studies art therapy resulted in an improvement from baseline but the improvement in the intervention group was not significantly greater than in the control group. In one study, outcomes were better in the control group than in the art therapy group. Relevant mental health outcome domains that were targeted in the included studies were depression, anxiety, mood, trauma, distress, QoL, coping, cognition and self-esteem. Improvements were frequently reported in each of these symptoms except for cognition.

Limitations of the quantitative evidence

Despite every possible effort to identify all relevant trials, the number of studies that qualified for inclusion was small. Despite a large number of records on art therapy yielded from the searches, very few studies were RCTs, demonstrating a slow uptake of the evidence-based medicine model in this field. The study samples are heterogeneous and few samples can be regarded strictly as the target population for this review – people diagnosed with a mental health condition. The limited selection of mental health disorders in the included study samples means that the external validity to the population with non-psychotic mental health disorders is limited. In addition, the sample sizes are small, and as yet there are no large-scale RCTs of art therapy in non-psychotic mental health disorders. The paucity of RCT evidence means that it is not possible to make generalisations about specific disorders or population characteristics.

The risk assessment of bias highlighted that, although all studies were reported to be RCTs, few studies reported how patients were randomised, and in the majority of studies there were several instances of high risk of bias. Areas of potential confounding frequently associated with the studies included attrition, concomitant treatment and treatment fidelity. Consequently, the internal validity of the included studies is threatened. Owing to the low quality of the 15 RCTs, the results included in the quantitative review should be interpreted with caution. As this systematic review did not search for and include direct evidence about other interventions for non-psychotic mental health disorders, it has not been possible to identify indirect evidence for the effect of art therapy in a mixed treatment comparison within the scope of this research. Therefore, the effectiveness of art therapy compared with other commonly used treatments that have been shown to be effective is unknown. In addition, the underlying mechanisms of action in art therapy remain unclear from this evidence. The qualitative systematic review that is presented in the next chapters will explore the factors that may contribute to the therapeutic action in art therapy.

Conclusions

From the limited number of studies identified, in patients with different clinical profiles, art therapy was reported to have statistically significant positive effects compared with control in a number of studies. The symptoms most relevant to the review question which were effectively targeted in these studies were depression, anxiety, low mood, trauma, distress, poor QoL, inability to cope and low self-esteem. The small evidence base, consisting of low-quality RCTs, indicated that art therapy was associated with an improvement from baseline in all but one study and was a more effective treatment for at least one outcome than the control groups in the majority of studies.

Copyright © Queen’s Printer and Controller of HMSO 2015. This work was produced by Uttley et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK279641

Contents