Although only H1N1, H1N2, and H3N2 subtypes are endemic in swine around the world, much diversity can be found in the genes coding for the major surface proteins hemagglutinin (HA) and neuraminidase (NA) and in the other 6 internal gene segments. The swine influenza A viruses (IAV) that emerged coincident with the 1918 Spanish flu are classified as classical swine H1N1 (
1). In the late 1990s, triple-reassortant H3N2 viruses containing gene segments derived from human seasonal H3N2, avian IAV, and the classical swine IAV were identified (
2,
3). The HA persisted, evolving into phylogenetic clades (cluster IV [C-IV] clades A to F) (
4). The triple-reassortant H3N2 viruses also reassorted with classical swine H1N1 viruses, resulting in the emergence of new HA and NA genetic clades of H1N1 and H1N2 viruses (
5) that preserved the triple-reassortant internal gene (TRIG) constellation. Genetically distinct human seasonal H1 spilled into and established in swine in the early 2000s (
6,
7). In 2009, a virus with genes from Eurasian avian H1N1, TRIG, and classical swine lineage genes emerged as a pandemic (H1N1pdm09) and continues to contribute to IAV diversity in swine (
8,
9). More recently, two distinct human H3N2 viruses, H3.2010.1 and H3.2010.2, were transmitted to swine (
10,
11). HA genes were paired with N2 genes derived from the 1998 or 2002 human seasonal-origin lineage (
12) or N1 genes from the classical swine lineage or the pandemic lineage (
13,
14). In 2018, a live-attenuated influenza virus (LAIV) vaccine became commercially available in the United States (
15). The LAIV viruses contain HA, annotated as H3 cluster I or H1 gamma2-beta-like, and NA, annotated as N2 LAIV-98 or N1 LAIV-classical, expressed on a TRIG internal gene backbone, with all components isolated in the 1990s. Reassorted viruses with LAIV genes have been detected. Interspecies transmission episodes and the processes of antigenic shift and drift led to approximately 16 distinct HA clades, 4 NA lineages, and 3 internal gene lineages (
16,
17).
We generated reference gene data sets and an analytical pipeline that assigns queried HA to genetic clade and queried NA and internal IAV genes to evolutionary lineages that are found in IAV from U.S. swine. Users need the reference data set and a FASTA file with query sequences from any IAV gene segment. The input data must be of good quality and substantial length (approximately 50% or greater of the gene of interest). The pipeline (
Fig. 1A) processes query sequences by (i) identification to one of 8 segments using BLASTn, (ii) alignment to the reference gene segment data set, (iii) the inference of a maximum likelihood tree, (iv) classification to evolutionary lineage or genetic clade using patristic distance extracted from the inferred tree, and (v) generation of a summary classification file and annotated gene trees (
Fig. 1B and
C). The reference data set for each gene includes nonswine genes, allowing the pipeline to flag sequences that are not contemporary circulating U.S. swine IAV. Genes derived from interspecies transmission events are annotated by a nonswine classification, and reassortment events involving different lineages can be identified in the summary file as disparate lineages (e.g., a single strain containing a mix of human seasonal, TRIG, and pandemic genes). Classification uses patristic distances extracted from gene trees using DendroPy in Python (
18) and smof for processing FASTA files (
19). The shortest distance from a query gene to a reference gene is identified, and the reference gene annotation is assigned to the query. Using swine IAV data collected in the United States from 2014 to present (929 strains and 7,432 genes), the pipeline accurately captured classifications assigned by manual phylogenetic curation (7,428 genes classified correctly; 99.95% accuracy). Our approach is reliant upon a relevant reference data set; the provided reference genes are adequate for swine IAV in the United States and Canada but have limited utility for swine IAV in Europe and Asia. However, this tool maintains utility for international swine IAV researchers if they generate a custom reference data set with appropriate clade or lineage annotation. Moreover, if interspecies transmission events result in the establishment of new lineages, contemporary data that capture this diversity may be added to the reference files by pipeline users or at the repository.