Introduction
DNA barcoding is a molecular method that can be used for identification to species; PCR amplification, followed by sequencing of a short conserved gene region and comparison of the sequence to a database of reference sequences, is used to identify the specimen. Fundamentally, the method allows the determination of whether a given gene sequence contains differences from reference sequences that are expected to be seen within a species compared to those seen between different species. First described for species identification by Hebert and co-workers in 2003 (
Hebert et al. 2003), it has been rapidly embraced by the scientific community despite early reservations by certain taxonomists (
DeSalle et al. 2005;
Will et al. 2005). Due to the generic nature and universal workflow of the technique, DNA barcoding has been applied to a wide range of scientific disciplines, from food authenticity to conservation studies (
Holmes et al. 2009;
Francis et al. 2010;
Hrcek et al. 2011;
Adamowicz 2015). The most common applications of DNA barcoding are to aid with species description, resolution of species complexes, and the identification of specimens of unknown taxa to species (
Kress et al. 2015).
DNA barcoding as an identification tool is most often compared to “traditional” species identification based on morphological characters (
Hajibabaei et al. 2007), given the comparison to identified reference sequences. However, it can also be compared to numerous other species-specific molecular detection or identification methods that identify species such as polymerase chain reaction (PCR) (
Kiewnick et al. 2013), real-time PCR (qPCR) (
Huang et al. 2010), and isothermal amplification techniques such as loop-mediated isothermal amplification (LAMP) (
Kikuchi et al. 2009). However, the latter methods are typically targeted to a single species (or occasionally a small group of species), and therefore they are used in the context of answering the question “is this specimen species x?”. Techniques such as DNA barcoding provide a fundamental step change, instead answering the question “what species is this specimen?”.
Plant health within the United Kingdom (UK) is enacted by the National Plant Protection Organisation (NPPO, government designated organisations) in adherence to the European Council Directive 2000/29/EC (
EU 2000) (and its subsidiary legislation). The overarching aim of the legislation is to provide a legal framework for the protection of forests and natural landscapes and to enable productive agricultural and horticultural trades by preventing the introduction and spread of harmful pests and pathogens. The European and Mediterranean Plant Protection Organisation (EPPO) is an inter-governmental organisation with responsibilities for harmonisation and cooperation of plant protection within the region (
EPPO 2016). EPPO also maintains lists of pests recommended for regulation: the A1 list (species absent from the EPPO region), the A2 list (species present in the EPPO region but not widely distributed), and the alert list (species posing new potential phytosanitary risks) (
EPPO PM1/2 (24) 2015;
EPPO 2015). Contingent to delivering policy objectives is the establishment of diagnostic methods, which allow the accurate and rapid identification of pests and pathogens. The EC directive and national legislation (
UK 2015) lists species that are actionable, thus intrinsically linking legislative action with taxonomy and identification.
The large numbers of specimens that need identifying, and the requirement for reference data for all the species from which these need to be discriminated, poses a challenge for DNA barcoding within the diagnostic laboratory. For every taxonomic group, the presence of a unique species-level barcode (which encompasses intraspecific variation but allows discrimination of interspecific variation) must be demonstrated, requiring access to different populations of the species. This is one of the major hurdles in the deployment of DNA barcoding in routine diagnostics, as there are often not enough sequences to describe the variation within a species or for closely related species (
Virgilio et al. 2012). Furthermore, there are important differences when utilising a scientific tool in a research context as opposed to a diagnostic context, especially if the consequences of an identification are significant; i.e., they may lead to the destruction of imported material, leading to significant financial losses. Currently, diagnostic laboratories are moving towards the use of only accredited and validated methods (
EPPO PM 7/98 (2) 2014) that are well established. DNA barcoding reference datasets (e.g., GenBank, BOLD—the Barcode of Life Data Systems,
Ratnasingham and Hebert 2007) are continually changing, and thus protocols need constant evaluation prior to use. This presents difficulties in terms of standardisation of methods and accreditation. Often, a “critical mass” of reference sequences for organismal groups is only obtained when there is a particular reason to study a group or if it is used as a model system by interested research groups (
Armstrong and Ball 2005). In the EU context, attempts to address these gaps have been undertaken, with the creation of a sequence database for EU regulated plant pests (Q-bank) (
Bonants et al. 2013). Nevertheless, with global trade, the range of species that may require identification, and therefore reference sequences, is vast.
Traditional morphological taxonomy (using morphological keys and diagnostic structures) is still the primary technique used for the identification of invertebrate samples, which in the hands of skilled and experienced taxonomic specialists can often result in a rapid and reliable identification. Molecular (as opposed to morphological) identification strategies may be needed for a number of reasons. Most commonly, key diagnostic structures may be damaged or missing, or there is no primary description or morphological key to the taxa or life stage being examined (
Shin et al. 2015). Another common application is to expedite identification following detection. Many important pests are intercepted as immature specimens, and often these cannot be identified as they lack definitive morphological characters. The traditional solution is to rear these to an identifiable life stage, such as the adult (
Ruiter et al. 2013). This, however, can be a lengthy process (often weeks) and frequently fails due to high levels of mortality. This is clearly not feasible when trade relations or, for example, the release of fresh produce at the border is at stake. The use of DNA barcoding in these situations can support a presumptive morphological identification and result in a much more rapid identification, or indeed generate an identification when one may not have been possible by morphological methods alone. Considering that identifications may reveal new zoogeographic record for regulated pest species, this could have a significant impact on, or even stop, trade from a country or region, with both financial and political consequences. Consequently, it is vital that the identification is robust. Multiple diagnostic methods, such as morphological identification and molecular tools, producing the same identification can add rigour and robustness to a finding to support any resulting action, and therefore combined diagnoses can be favourable to NPPOs.
In this paper, we will use a series of data-linked case studies to illustrate some of the varied scenarios in which DNA barcoding has been used by Defra plant health in England and Wales. This will illustrate the diverse applications, beyond the identification process itself, in which DNA barcoding can be used, as well as highlighting some of the limitations and drawbacks that still exist and hamper broader deployment of the technique. In case study 1, we will illustrate the use of DNA barcoding for the identification of a combined interception of a significant plant pest alongside its vector, which is also a plant pest. Case study 2 studies a series of examples that have highlighted potentially high-risk trade routes for plant pests. Case study 3 demonstrates the use of DNA barcoding for the identification of regulated plant pests, and case study 4 highlights the limitations of DNA barcoding posed by lack of reference sequences in understudied taxa, in this instance psyllids.
Materials and methods
Details of sample collection for each case study are presented in the Results and discussion section to provide a more succinct interpretation associated with each case study. The Materials and methods section outlines the identification and molecular methods used in the study.
Morphological identification and sample preparation
Vermiform endoparasitic nematodes were extracted from wood samples following the Baermann funnel method (
EPPO PM 7/119 (1) 2013), as recommended in the EPPO diagnostic protocol for
Bursaphelenchus xylophilus (
EPPO PM 7/4 (3) 2013) (Case study 1). Free-living
Meloidogyne males and infective juveniles were extracted from soil substrate using the Oostenbrink elutriator technique (
EPPO PM 7/119 (1) 2013) (Case study 3.1). Nematode specimens were fixed in TAF (
Hooper 1986) and mounted on glass slides for microscopic examination at 100× magnification. EPPO diagnostic protocols
EPPO PM 7/4 (3) 2013,
EPPO PM 7/41 (2) 2009, and reference works
Ryss et al. (2005) and
Perry et al. (2009), were used to facilitate morphological identification of
Bursaphelenchus and
Meloidogyne nematode specimens. Nematodes used for molecular analysis were mounted in water on glass slides, examined under a compound microscope at 400×, and then placed in 1.5 mL microcentrifuge tubes and immediately frozen.
Insect morphological identifications were made with reference to all available and relevant taxonomic keys, original descriptions, and preserved museum specimens as follows: psyllid genera
Aphalara,
Baopelma,
Cacopsylla,
Diaphorina,
Psylla,
Psyllopsis, and
Trioza with reference works
Hodkinson and White (1979),
Ossiannilsson (1992), and
Bantock and Botting (2013) (Case study 4);
Monochamus alternatus with reference works
Hope (1843),
Kojima (1931),
Gressitt (1942),
Duffey (1968),
Invasive.org (2010), and
CABI (2016) (Case study 1). The principles followed when taking samples for molecular analysis was to remove a representative amount of material without completely disrupting the most important diagnostic characters. For large adult insects (such as beetles), the right rear leg was taken, and for smaller softer-bodied insects (such as psyllids) the right middle and right hind legs were taken. For intact larval specimens, a section such as abdominal segments 1–3 was excised. For completely disrupted specimens that were too damaged for morphological identification, a representative sample of the most fleshy material was taken, as it has been found that heavily chitinised structures such a mandibles and other part of the exoskeleton often fail to yield sufficient DNA. Samples for molecular analysis were placed in 1.5 mL microcentrifuge tubes and immediately frozen. The remains of the specimen were frozen and retained for future reference or further sampling if needed.
Samples were removed from the specimen whilst viewing on a petri-dish or other suitably sized container under a binocular dissecting microscope at magnifications of up to 160× and using a combination of fine entomological forceps, seekers, entomological pins, and a scalpel. Between specimens, all instruments were sterilised by flaming and were then cleaned with 70% ethanol.
Sample information is provided in
Table 1. Each sample (or sub-sample of larger specimens) was assigned a unique identifying number for the molecular study.
DNA extraction
For all samples apart from psyllids, DNA extractions were performed using a DNeasy® blood and tissue kit (QIAGEN, West Sussex, UK), following the manufacture’s protocol for animal tissues using spin columns. For larger samples (i.e., beetle legs, abdomen sections), tissue was homogenised with a micro-pestle (STARLAB, Milton Keynes, UK) prior to overnight lysis. Smaller samples (e.g., nematodes, psyllid legs) were not subject to homogenisation and were re-suspended in homogenisation buffer prior to overnight lysis. The final elution volume was adjusted relative to the size of the sample tissue, ranging from 100 to 400 μL.
Alternatively for psyllid samples (Case study 4), DNA was extracted using a Chelex-100 resin based method (
Boonham et al. 2002). Single legs and wings of whole insects were removed using sterile fine forceps and placed in individual 0.6 mL microcentrifuge tubes. The tissue sample was homogenised using a sterile micro-pestle, 100 μL of molecular-grade water was added, and the sample was further homogenised. A slurry of 100 μL of a 50%
w/
w chelex resin:molecular grade water was added, the sample heated to 95 °C for 5 min, centrifuged for 5 min, and the supernatant transferred and stored at −30 °C prior to use.
PCR
For invertebrate samples, three primer pairs were used (separately); two of these amplify the “standard” cytochrome
c oxidase subunit I (COI) barcode and the third a partial 3-prime section of the COI barcode region (
Table 2). Using this approach, the majority of samples are positive with one of the three primer pairs. On occasions when these do not amplify the sample, alternate primers for the same gene region are selected based on the suspected family; typically those described in
Simon et al. (1994 and
2006) are used as a starting point. PCR primers JB3/JB5 were used in PCR of
Meloidogyne species nematodes to amplify a 450-bp fragment of the COI gene (not the standard COI barcode region). For
Bursaphelenchus species nematodes, PCR primers LCO1490/HCO2198 were used as described for invertebrate samples. All primers were synthesised by Eurofins-MWG-Operon.
All PCR reactions were performed using a proof reading DNA polymerase in a GeneAmp® 9700 thermocycler (Applied Biosystems, California, USA). PCR reactions (25 μL) comprised 12.5 μL 2x bio-x-act short PCR mix (Bioline, London, UK), 400 nm each primer, and 1 μL DNA (concentration as extracted). To stream line testing, PCR conditions were harmonised for the three invertebrate primer pairs so that all could be run in parallel in a single thermocycler at the same time as follows: 5 min at 94 °C; followed by 35 cycles of 30 s at 94 °C, 45 s at 50 °C, and 1 min at 72 °C; and 10 min at 72 °C. Cycling conditions for primers JB3/JB5 were as follows: 5 min at 95 °C; followed by 40 cycles of 1 min at 95 °C, 1 min 30 s at 41 °C, and 1 min at 72 °C; and 10 min at 72 °C.
PCR products (5 μL) were separated by gel electrophoresis in 1% agarose gel in 1x Tris-borate-EDTA buffer (89 mmol/L Tris, 89 mmol/L boric acid, 2mmol/L EDTA), stained with ethidium bromide, and visualised using a UV transilluminator. PCR products were purified using the QIAquick® PCR purification Kit (QIAGEN, West Sussex, UK) prior to sequencing on both strands using the PCR primers by Eurofins-MWG-Operon. In instances when multiple COI primer sets generated a PCR amplicon, one of the full length barcode regions was selected in preference to the shorter amplicon. To generate reference sequences for a reference specimen of each
Meloidogyne species, the PCR product was cloned into the pGEM®-T easy vector system (Promega, Wisconsin, USA) following the manufactures protocol. Where possible, multiple specimens or populations of each species (preferably a minimum of three) were subjected to DNA sequencing (see the supplementary data, Table S1
2, for sample information). Sequences were submitted to NCBI (see
Table 1 for accession numbers).
Sequence analysis
DNA sequences were proofread by eye in Sequence Scanner version 2 (Applied Biosystems California, USA) and consensus sequences created in MEGA version 4.1 (
Tamura et al. 2007) where each nucleotide position had been sequenced twice (single read sections were excluded from the consensus sequence). The IUPAC Ambiguity Code (
Cornish-Bowden 1985) was used for true polymorphic positions (not sequencing ambiguity). Alignments and analyses were performed using MEGA version 4.1 software using the neighbour-joining method with default values, Kimura 2-parameter distance metric, and 1000 replications for bootstrap analysis. Database searches (BLAST) were performed at the NCBI website (
http://www.ncbi.nlm.nih.gov/), BOLD website (
http://www.boldsystems.org/, using the “Species Level Barcode Records” database), and Q-bank website (
http://www.q-bank.eu/); in all cases, sequences were analysed using all three databases, apart from
Meloidogyne species, which were only analysed using Q-bank (due to this database containing more reliable reference sequences for the genus). To assess the reliability of the DNA barcode to produce a species-level identification, results were assessed in terms of percentage sequence identity to reference sequences, intra- and interspecies variation compared to reference sequences (using Kimura 2-parameter distance metric), representation of reference sequences for other species within the genus or family (as appropriate), and assessment of availability of references sequences from species within the genus and those taxa known to be present in the country of origin of the sample (for elimination purposes). Database searches were performed at the date mentioned in each case study (2010–2015) and repeated in January 2016.
General discussion
Within a diagnostic laboratory, DNA barcoding is an attractive proposition; it can provide an identification in the absence of the required taxonomic expert. Even in laboratories that draw upon a combination of morphological and molecular methods, there are numerous reasons for the deployment of barcoding, which are most often related to instances of damaged specimens or interception of life stages that cannot be morphologically identified. Using DNA barcoding to rule-out a suspected high-risk species, for example, preventing “false alarms” by identifying native species, can often be as important as those instances of findings of significant regulated quarantine species. With the understanding that performing DNA barcoding may not necessarily result in a species identification, the technique has many merits that will lead to its implementation and use in diagnostics laboratories.
The seminal work of Dr. Paul Hebert and co-workers (
Hebert et al. 2003) brought DNA barcoding to the scientific mainstream; the wide applicability of DNA barcoding has meant that it has been very broadly used across numerous scientific disciplines. The advent of any new diagnostic protocol is typically the start of a lengthy process of evolution, before those techniques that are truly widely applicable, robust, and practical make their way into diagnostic laboratories. Whilst sequencing is not a new concept, the neat conceptual framework of DNA barcoding make it an attractive proposition for diagnostic laboratories, which often need generic methodologies to use alongside species-specific tests (
Boonham et al. 2008) and morphological identification.
Nevertheless, there are still limitations and drawbacks of DNA barcoding, primarily around the lack of required referenced sequences. Furthermore, care must be taken when utilising methods such as DNA barcoding that rely upon species placement within phylogenetic trees and (or) percentage sequence similarities. In particular, robustness of identifications must be ensured when reference sequences for relevant taxa may not be present, and systems for determining when relevant taxa are not present or are not reliable must be in place (
Collins and Cruickshank 2013). Thorough analysis and contextualisation of an identification to ensure its congruence with other pertinent information such as origin, pathway, and past host and geographic records enable the suitability of DNA barcoding of a given sample to be determined on a case-by-case basis. The lack of reference sequences will gradually reduce over time as more sequences are made available; however, the quality of reference sequences, from correctly identified specimens, is essential, and there are numerous examples of sequences assigned to the incorrect species in public databases (
Shen et al. 2013). These erroneous sequences can make assessment of a potential barcode challenging, and compliance of all scientists using DNA barcoding with the recommended data standards for DNA barcode sequence records (
Hanner et al. 2009) will help to reduce these issues. The lack of whole-organism taxonomic expertise is, however, going to remain a rate-limiting step.
A more recent development of DNA barcoding is its combination with next-generation, or massively parallel, sequencing technologies leading to what has been called metabarcoding. Metabarcoding promises to provide a step change, in particular for the identification of organisms within mixtures or communities (
Shokralla et al. 2014). Whilst many studies have demonstrated proof of concept (
Ji et al. 2013), the technique is still in its infancy in terms of front-line application. Next-generation sequencing has decreased considerably in cost in recent years, but this has come at the expense of usable read-length. The most cost-effective sequencing platforms are MiSeq and HiSeq (Illumina), which would enable metabarcoding to be carried out cost effectively as a front-line service. However, due to the significant decrease in quality as the individual reads exceed 200 nucleotides, the maximum length of sequence that can be generated with an error rate low enough to enable accurate identification is approximately 450 nucleotides (2 × 300 nucleotides paired reads on a MiSeq). Whilst the methodology is well established for bacterial communities sequencing a 250 nucleotide fragment of the v4 region of the 16S rRNA gene (
Kozich et al. 2013), this is considerably shorter than the standard barcode length for invertebrates of 650 bp, and in some applications the reduction in length results in poor species-level resolution (
Liu et al. 2013). Some researchers have explored the use of shorter regions of the standard barcode, or study alternate genes that may provide improved species resolution with shorter fragments; however, this presents the perennial issue of lack of reference sequences (
Deagle et al. 2014). The large number of sequencing reads generated enables some solutions; the use of multiple, shorter overlapping amplicons to assemble longer amplicons or amplifying the standard COI barcode and producing a barcode sequence by assembling together sequences from each end with shotgun sequence of the whole amplicon (
Liu et al. 2013). Next-generation sequencing technologies are rapidly evolving (e.g., MinIon from Oxford Nanopore), and it is probable that within a short period of time they will be able to generate suitable read-lengths for DNA barcoding, enabling the use of existing databases.
It is interesting to note the trend towards using DNA barcoding to assign samples to molecular operational taxonomic units (MOTUs), as opposed to linking a DNA sequence to a species name assigned following morphological identification (
Blaxter et al. 2005). This is a phenomenon that the overwhelming diversity of undescribed species on earth and the taxonomic impediment for identifying and describing them is driving forward (
Ratnasingham and Hebert 2013) and next-generation sequencing is facilitating, and there are many potential benefits to this approach. Unfortunately, within a regulatory context this is unsatisfactory as identification of samples to a named species is needed, either using conventional or molecular means. A move towards regulation of a species described only by a numbered sequence (MOTU), not including any linkage to traditional morphological species, would be a substantial step change in methodology that is hard to envisage in the near future.
The risks posed to plant health biosecurity are continually expanding due to impacts such as increases and changes in global trade, resulting in the increased movement of exotic pests and pathogens into new regions, as well as expansion of the EU and climate change affecting both crop ranges and pest ranges (
Armstrong and Ball 2005). The application of DNA barcoding can enable the identification of a wide range of pests and pathogens, and its use has resulted in the identification of quarantine-listed species and subsequent actions to contain and prevent potentially high-risk species from entering the UK. Furthermore, the use of the method has flagged potential trade routes that may pose a biosecurity risk as a pest pathway, which can then be subject to increased scrutiny. When employed within a defined scope that considers and understands the limitations of DNA barcoding, the method shows great potential as a tool that can be embedded within diagnostic laboratories.