Advertisement
Perspective| Volume 110, ISSUE 1, P3-12, January 05, 2023

Download started.

Ok

Genotype first: Clinical genomics research through a reverse phenotyping approach

      Summary

      Although genomic research has predominantly relied on phenotypic ascertainment of individuals affected with heritable disease, the falling costs of sequencing allow consideration of genomic ascertainment and reverse phenotyping (the ascertainment of individuals with specific genomic variants and subsequent evaluation of physical characteristics). In this research modality, the scientific question is inverted: investigators gather individuals with a genomic variant and test the hypothesis that there is an associated phenotype via targeted phenotypic evaluations. Genomic ascertainment research is thus a model of predictive genomic medicine and genomic screening. Here, we provide our experience implementing this research method. We describe the infrastructure we developed to perform reverse phenotyping studies, including aggregating a super-cohort of sequenced individuals who consented to recontact for genomic ascertainment research. We assessed 13 studies completed at the National Institutes of Health (NIH) that piloted our reverse phenotyping approach. The studies can be broadly categorized as (1) facilitating novel genotype-disease associations, (2) expanding the phenotypic spectra, or (3) demonstrating ex vivo functional mechanisms of disease. We highlight three examples of reverse phenotyping studies in detail and describe how using a targeted reverse phenotyping approach (as opposed to phenotypic ascertainment or clinical informatics approaches) was crucial to the conclusions reached. Finally, we propose a framework and address challenges to building collaborative genomic ascertainment research programs at other institutions. Our goal is for more researchers to take advantage of this approach, which will expand our understanding of the predictive capability of genomic medicine and increase the opportunity to mitigate genomic disease.

      Graphical abstract

      Introduction

      Human genetics research successes have relied primarily on a phenotype-first approach, wherein individuals presenting with similar phenotypes, typically a disease state, are assessed for shared genetic variants.
      • Moresco E.M.Y.
      • Li X.
      • Beutler B.
      Going forward with genetics: recent technological advances and forward genetics in mice.
      Although this has led to an enormous body of knowledge in human genetics, the phenotype-first approach engenders an ascertainment bias that can affect our understanding of penetrance, expressivity, and variant pathogenicity.
      • Altshuler D.
      • Daly M.J.
      • Lander E.S.
      Genetic mapping in human disease.
      ,
      • Nannenberg E.A.
      • van Rijsingen I.A.W.
      • van der Zwaag P.A.
      • van den Berg M.P.
      • van Tintelen J.P.
      • Tanck M.W.T.
      • Ackerman M.J.
      • Wilde A.A.M.
      • Christiaans I.
      Effect of ascertainment bias on estimates of patient mortality in inherited cardiac diseases.
      ,
      • Ranola J.M.O.
      • Tsai G.J.
      • Shirts B.H.
      Exploring the effect of ascertainment bias on genetic studies that use clinical pedigrees.
      ,
      • Brohet R.M.
      • Velthuizen M.E.
      • Hogervorst F.B.L.
      • Meijers-Heijboer H.E.J.
      • Seynaeve C.
      • Collée M.J.
      • Verhoef S.
      • Ausems M.G.E.M.
      • Hoogerbrugge N.
      • van Asperen C.J.
      • et al.
      Breast and ovarian cancer risks in a large series of clinically ascertained families with a high proportion of BRCA1 and BRCA2 Dutch founder mutations.
      Genotype-phenotype associations are limited when researchers select participants strictly based on phenotype.
      • Decaudain A.
      • Vantyghem M.-C.
      • Guerci B.
      • Hécart A.C.
      • Auclair M.
      • Reznik Y.
      • Narbonne H.
      • Ducluzeau P.-H.
      • Donadille B.
      • Lebbé C.
      • et al.
      New metabolic phenotypes in laminopathies: LMNA mutations in patients with severe metabolic syndrome.
      ,
      • Young J.
      • Morbois-Trabut L.
      • Couzinet B.
      • Lascols O.
      • Dion E.
      • Béréziat V.
      • Fève B.
      • Richard I.
      • Capeau J.
      • Chanson P.
      • Vigouroux C.
      Type A insulin resistance syndrome revealing a novel lamin a mutation.
      ,
      • Bertrand A.T.
      • Chikhaoui K.
      • Yaou R.B.
      • Bonne G.
      Clinical and genetic heterogeneity in laminopathies.
      Such an approach inherently lacks the ability to determine the full phenotypic spectrum of a genomic disease.
      Genomic ascertainment offers the opportunity to address these limitations by selecting participants based on genomic variants of interest, rather than phenotypes of interest, to test a hypothesized genotype-phenotype relationship. In this research mode, genomically ascertained participants are evaluated for specific, measurable phenotypes and compared to individuals who do not test positive for the same genetic variant(s), a process called reverse phenotyping.
      • Schulze T.G.
      • McMahon F.J.
      Defining the phenotype in human genetic studies: forward genetics and reverse phenotyping.
      ,
      • McGuire S.E.
      • McGuire A.L.
      Don't throw the baby out with the bathwater: enabling a bottom-up approach in genome-wide association studies.
      In this perspective, we describe our genomic ascertainment research program. We have analyzed an initial series of projects that exploits this reverse phenotyping approach and compare this approach to the phenotypic ascertainment and clinical informatics approaches for discerning genotype-phenotype relationships. We also provide a framework that others can use to establish such a program at their institutions.

      Establishing a pilot program for genomic ascertainment research

      A genomic ascertainment research program relies on the availability of sequenced participants who are eligible to participate in follow-up research based on a genomic variant of interest (also known as “recall-by-genotype”). As a large-scale genetic sequencing study at the largest clinical research hospital in the United States, the ClinSeq study in the National Human Genome Research Institute (NHGRI) at the National Institutes of Health (NIH) was able to serve as a pilot program for genomic ascertainment research.
      • Biesecker L.G.
      • Mullikin J.C.
      • Facio F.M.
      • Turner C.
      • Cherukuri P.F.
      • Blakesley R.W.
      • Bouffard G.G.
      • Chines P.S.
      • Cruz P.
      • Hansen N.F.
      • et al.
      The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine.
      Individuals enrolled in ClinSeq underwent exome sequencing without a clinical indication and also consented to subsequent recontact for future research. Following the ClinSeq model, we identified other sequenced cohorts whose participants consented to recontact regarding enrollment in follow-up research. Because genomic ascertainment research is predicated on selecting research participants based on their genomic status agnostic to phenotype, it is essential that the sequenced datasets are not significantly enriched or depleted for a phenotype relevant to the hypothesis being tested.
      • Akhrif A.
      • Roy A.
      • Peters K.
      • Lesch K.-P.
      • Romanos M.
      • Schmitt-Böhrer A.
      • Neufang S.
      REVERSE phenotyping—Can the phenotype following constitutive Tph2 gene inactivation in mice be transferred to children and adolescents with and without adhd? brain and behavior n/a.
      In addition to the ClinSeq study (1,474 exomes),
      • Biesecker L.G.
      • Mullikin J.C.
      • Facio F.M.
      • Turner C.
      • Cherukuri P.F.
      • Blakesley R.W.
      • Bouffard G.G.
      • Chines P.S.
      • Cruz P.
      • Hansen N.F.
      • et al.
      The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine.
      we collected genomic data from newborn trio sequencing studies at Inova Health System (4,724 genomes)
      • Bodian D.L.
      • Klein E.
      • Iyer R.K.
      • Wong W.S.W.
      • Kothiyal P.
      • Stauffer D.
      • Huddleston K.C.
      • Gaither A.D.
      • Remsburg I.
      • Khromykh A.
      • et al.
      Utility of whole-genome sequencing for detection of newborn screening disorders in a population cohort of 1, 696 neonates.
      ,
      • Hauser N.S.
      • Solomon B.D.
      • Vilboux T.
      • Khromykh A.
      • Baveja R.
      • Bodian D.L.
      Experience with genomic sequencing in pediatric patients with congenital cardiac defects in a large community hospital.
      and from the National Institute of Allergy and Infectious Diseases (NIAID) Centralized Sequencing Program (557 exomes).
      • Similuk M.N.
      • Yan J.
      • Ghosh R.
      • Oler A.J.
      • Franco L.M.
      • Setzer M.R.
      • Kamen M.
      • Jodarski C.
      • DiMaggio T.
      • Davis J.
      • et al.
      Clinical exome sequencing of 1000 families with complex immune phenotypes: towards comprehensive genomic evaluations.
      Demographic information for each cohort is presented in Table S1. All participants had previously consented to broad data sharing of deidentified genomic and clinical data and recontact regarding future research opportunities. We centralized these resources in the establishment of NHGRI’s Reverse Phenotyping Core.
      Because the genome and exome sequence datasets were from distinct cohorts with different sequencing processes, these data needed to be harmonized to allow for consistent identification and searching for variants of interest. All samples were processed using the same bioinformatics pipeline, starting with sequence data in the form of FASTQ files. First, binary alignment map (BAM) files were created by aligning FASTQ files to the human genome (version GRCH37) using NovoAlign (https://www.novocraft.com/products/novoalign/). Variant call files (VCFs) were then generated using a modified GATK pipeline,
      • Van der Auwera G.A.
      • O'Connor B.D.
      Genomics in the Cloud: Using Docker, GATK, and WDL in Terra.
      ,
      • Poplin R.
      • Ruano-Rubio V.
      • DePristo M.A.
      • Fennell T.J.
      • Carneiro M.O.
      • Van der Auwera G.A.
      • Kling D.E.
      • Gauthier L.D.
      • Levy-Moonshine A.
      • Roazen D.
      • et al.
      Scaling accurate genetic variant discovery to tens of thousands of samples.
      which included base-score recalibration, duplicate marking, joint calling of genotypes, and hard filtering of variant quality scores. Deidentified, unlinked single-nucleotide and short insertion-deletion (indel) variants were uploaded to a searchable web browser (accessibility limited to NIH intramural researchers) based on gnomAD site architecture (Figure S1).
      • Karczewski K.J.
      • Francioli L.C.
      • Tiao G.
      • Cummings B.B.
      • Alföldi J.
      • Wang Q.
      • Collins R.L.
      • Laricchia K.M.
      • Ganna A.
      • Birnbaum D.P.
      • et al.
      The mutational constraint spectrum quantified from variation in 141, 456 humans.
      As bioinformatics tools advance in complexity, other variant types, such as copy-number variants, could also be annotated and made available as searchable variants. As well, the super-cohort bioinformatics staff should make available to researchers custom queries of genomic attributes if they are not routinely analyzed and displayed.
      The key to this form of research is the formulation of a hypothesis in the general form of “given a deleterious variant (genotype) in gene X, I hypothesize that individuals with this variant will manifest Y phenotype.” The phenotype may be a medical history attribute, a physical finding, or a biomarker of some kind. Then, candidate variants are identified in the genomic ascertainment cohort via the searchable web browser. Critical to this step of the research is a hypothesis that can inform variant/genotype selection. The investigator must identify variants that have a substantial probability of being deleterious. This is easiest for heterozygous loss-of-function or haploinsufficiency pathogenetic models. Then, potential research participants harboring variants of interest can be recontacted and invited to participate in a reverse phenotyping study. In addition, if the phenotype is relatively common or subjective, it can be important to select suitable controls who do not harbor the variant, in numbers appropriate to power a statistical comparison. If participants consent to joining the study, clinical staff coordinate examinations and/or sample collection to generate phenotypic data de novo by performing targeted studies on these participants or molecular assays on their samples.

      Reverse phenotyping research at NHGRI

      The reverse phenotyping pilot program at the NIH has resulted in 13 published reverse phenotyping studies, which included recontact of 190 genomically ascertained participants with phenotypic assessment for 60 conditions (Table S2). Our reverse phenotyping studies can be sorted into three categories: testing a novel genotype-disease association, broadening the phenotypic spectrum of a known genotype-disease association, or performing ex vivo analyses of a trait. To illustrate how reverse phenotyping can be applied in collaborative clinical and basic science research, we have highlighted an example from each category. These studies demonstrate how the targeted reverse phenotyping analyses were essential to address the aims of the study.

      Validating a novel genotype-disease association

      For an investigator aiming to establish a previously unknown association of a genetic variant to a phenotype, reverse phenotyping from a phenotypically unselected cohort can validate the veracity of the association. Beginning with a traditional phenotypic ascertainment approach, it was hypothesized that single-allele TPSAB1 (MIM: 191080) duplications or triplications were associated with elevated basal tryptase levels and systemic complaints.
      • Lyons J.J.
      • Yu X.
      • Hughes J.D.
      • Le Q.T.
      • Jamil A.
      • Bai Y.
      • Ho N.
      • Zhao M.
      • Liu Y.
      • O'Connell M.P.
      • et al.
      Elevated basal serum tryptase identifies a multisystem disorder associated with increased TPSAB1 copy number.
      A non-specific pattern of recurrent cutaneous symptoms, dysautonomia, gastrointestinal dysfunction, connective tissue abnormalities, and systemic venom reactions were initially observed as a shared phenotype across members of multiple clinically affected families. The association between TPSAB1 copy-number variation and these symptoms was considered novel as it had not been previously documented in OMIM. For the reverse phenotyping experiment, the investigators hypothesized that genomically ascertained individuals with TPSAB1 copy-number variants would also show elevated basal serum tryptase levels and systemic complaints.
      With no prior knowledge of the clinical status of any participants, investigators identified and recontacted nine individuals with TPSAB1 monoallelic duplications in our reverse phenotyping cohort. These nine individuals, along with 82 controls without TPSAB1 copy-number variation, underwent a targeted reverse phenotyping evaluation including basal serum tryptase levels and a standardized telephone interview. The genomically ascertained individuals had elevated basal serum tryptase levels and were significantly more likely to experience symptoms consistent with hereditary α-tryptasemia compared to matched controls, validating the variant-phenotype association.
      A key consideration in this example is the expanded opportunity that comes from recontact and targeted testing compared to relying only on information and diagnoses contained within clinical electronic health record (EHR) data. By focusing on a modest number of participants with a bespoke phenotyping tool, the research team could assess the presence or absence of clinical manifestations of elevated tryptase levels in a robust, systematic, and efficient manner. We expect that few, if any, individuals would have these specific data in their records without targeted questions asked on recontact. Thus, reliance on clinical information in the EHR would require a much larger sample size of participants to identify the association.

      Broadening the phenotypic spectrum of a known genotype-disease association

      The second category of studies involves expansion of the phenotypic range of a known genetic condition. Combined malonic and methylmalonic aciduria (CMAMMA) (MIM: 614265) is associated with an elevated ratio of methylmalonic acid (MMA) to malonic acid (MA) in urine and with clinical symptoms of metabolic acidosis, failure to thrive, seizures, and immunodeficiency.
      • Gregg A.R.
      • Warman A.W.
      • Thorburn D.R.
      • O'Brien W.E.
      Combined malonic and methylmalonic aciduria with normal malonyl-coenzyme A decarboxylase activity: a case supporting multiple aetiologies.
      The cause of CMAMMA had been shown to be biallelic variants in ACSF3 (MIM: 614245), a member of the acyl-CoA synthetase family.
      • Sloan J.L.
      • Johnston J.J.
      • Manoli I.
      • Chandler R.J.
      • Krause C.
      • Carrillo-Carrasco N.
      • Chandrasekaran S.D.
      • Sysol J.R.
      • O'Brien K.
      • Hauser N.S.
      • et al.
      Exome sequencing identifies ACSF3 as a cause of combined malonic and methylmalonic aciduria.
      The initial cohort of individuals presented with neurologic symptoms including seizures, memory problems, psychiatric disease, and/or cognitive decline.
      Although these phenotypically ascertained participants displayed striking features of the condition, the investigators hypothesized that less severe presentations of CMAMMA could be present in an undiagnosed population. Genomic ascertainment identified a 66-year-old individual with a homozygous ACSF3 genotype for a pathogenic variant that had been observed in some of the typically affected individuals (Figure S1). The available medical history of this individual was unremarkable and therefore inconsistent with the expected phenotype of a serious and typically severe metabolic disorder. If this evaluation were limited to data abstracted from the health record, the investigators would have likely concluded that this person was phenotypically unaffected and the biallelic variants identified were not pathogenic in this individual. However, because this individual was eligible for targeted reverse phenotyping, she was recontacted and assessed biochemically and clinically for features present in the original study cohort. She was found to have MMA/MA ratios that were among the highest in the CMAMMA study cohort and an updated clinical history that included incontinence and memory problems atypical for her age (but not severe enough to have led to a clinical diagnosis). An MRI was performed and showed multiple abnormalities that were consistent with metabolic infarcts. The detailed evaluation of this individual found previously unrecognized subtle manifestations of CMAMMA with late-onset penetrance. This study broadened the spectrum of ACSF3-related CMAMMA and illustrates how reverse phenotyping can expand the phenotypic range of a genetic condition. It is also important to recognize that this single individual provided a powerful validation of the hypothesis—given the extraordinary perturbations of her organic acid levels, an N of 1 was sufficient and controls were not necessary.

      Ex vivo phenotypic analysis with participant samples

      The third category of studies involve genomically ascertained participants who were recontacted to obtain samples for ex vivo assays to test a molecular phenotype in an unselected population. Researchers investigating the genetic cause of periodic fever, aphthous stomatitis, pharyngitis, and cervical adenitis (PFAPA) syndrome found that a common single-nucleotide variant (rs17753641) in IL12A (MIM: 161560) previously associated with the systemic inflammatory disorder Behçet’s disease (MIM: 109650) was also more likely to be observed in individuals with PFAPA syndrome.
      • Manthiram K.
      • Preite S.
      • Dedeoglu F.
      • Demir S.
      • Ozen S.
      • Edwards K.M.
      • Lapidus S.
      • Katz A.E.
      • Feder Jr., H.M.
      • et al.
      Genomic Ascertainment Cohort
      Common genetic susceptibility loci link PFAPA syndrome, Behçet’s disease, and recurrent aphthous stomatitis.
      These researchers hypothesized that peripheral blood mononuclear cells (PBMCs) from genomically ascertained individuals with the IL12A risk allele(s) would show increased IL-12p70 upon stimulation compared to PBMCs from donors homozygous for the reference allele. Forty-eight such individuals were recruited to test this hypothesis (25 homozygous for the reference allele, 19 heterozygous for the risk allele, and 4 homozygous for the risk allele). Indeed, there was an additive variant allele-dependent elevation in IL-12p70 (Manthiram et al., 2020
      • Manthiram K.
      • Preite S.
      • Dedeoglu F.
      • Demir S.
      • Ozen S.
      • Edwards K.M.
      • Lapidus S.
      • Katz A.E.
      • Feder Jr., H.M.
      • et al.
      Genomic Ascertainment Cohort
      Common genetic susceptibility loci link PFAPA syndrome, Behçet’s disease, and recurrent aphthous stomatitis.
      ). Targeting participant recruitment based on genotype allowed investigators to test a molecular phenotype ex vivo while minimizing the time and resources that would have been required to obtain the same number of samples using a genotype-blind approach. The minor allele frequency of the variant allele is 0–0.11 in eight subpopulations in gnomAD,
      • Karczewski K.J.
      • Francioli L.C.
      • Tiao G.
      • Cummings B.B.
      • Alföldi J.
      • Wang Q.
      • Collins R.L.
      • Laricchia K.M.
      • Ganna A.
      • Birnbaum D.P.
      • et al.
      The mutational constraint spectrum quantified from variation in 141, 456 humans.
      and thus homozygotes are rare, which would have necessitated ascertaining and genotyping many individuals to find suitable participants. The ability to recontact individuals to have a fresh blood sample drawn was a critical component of this study, again demonstrating the importance of recontact and targeted phenotyping for this type of research. By recruiting from our phenotypically unselected cohort (as opposed to from a cohort of clinically affected individuals), the team was able to assess the functional effect of the SNP of interest while limiting potential confounding factors.
      Altogether, these studies provide a representative example of how targeted reverse phenotyping can maximize the potential of collaborative clinical and basic science research through deep phenotyping genomically ascertained participants.

      Comparison with other modes of research

      Genomic ascertainment research provides an opportunity to capitalize upon the large amount of genomic sequencing data currently being generated in a way that was unimaginable even just two decades ago. As illustrated above, reverse phenotyping can be particularly useful when examining rare phenotypes that are outside of routine assessment, require specialized tests, or involve asking targeted personal and family history questions. As such, it has several advantages compared to phenotypic ascertainment or clinical informatics research (Figure 1).
      Figure thumbnail gr1
      Figure 1Research approaches to formulate and test human genotype-phenotype relationships
      Clinical genetics research has primarily relied on three approaches to discern the relationship between genetic variation and human traits: phenotypic ascertainment, which uses a cohort of phenotypically similar individuals to identify an underlying genetic cause; clinical informatics, which tests associations between known genotypes and previously recorded clinical data typically stored in the electronic health record (EHR); and more recently reverse phenotyping, which targets recruitment to individuals with known genotypes to collect new phenotypic data.
      Phenotypic ascertainment research involves establishing a cohort based on shared phenotypic attributes, then working toward a genetic explanation. Similar to reverse phenotyping, phenotypic ascertainment research interfaces directly with participants to perform tailored exams, but it does so a priori of the determination of genetic association. This facilitates research on rare conditions in which symptoms may not be routinely assessed, but it does not enable assessments driven by the gene identification. Cohorts comprising affected individuals presenting for clinical care can be biased toward attributes that match the researchers’ biases about the nature of the phenotype or its severity.
      • Nannenberg E.A.
      • van Rijsingen I.A.W.
      • van der Zwaag P.A.
      • van den Berg M.P.
      • van Tintelen J.P.
      • Tanck M.W.T.
      • Ackerman M.J.
      • Wilde A.A.M.
      • Christiaans I.
      Effect of ascertainment bias on estimates of patient mortality in inherited cardiac diseases.
      ,
      • Ranola J.M.O.
      • Tsai G.J.
      • Shirts B.H.
      Exploring the effect of ascertainment bias on genetic studies that use clinical pedigrees.
      Furthermore, selecting cohort members based on shared disease-related traits may result in missed genotype-phenotype associations that exist outside the initial, limited investigational scope due to phenotypic heterogeneity.
      • Decaudain A.
      • Vantyghem M.-C.
      • Guerci B.
      • Hécart A.C.
      • Auclair M.
      • Reznik Y.
      • Narbonne H.
      • Ducluzeau P.-H.
      • Donadille B.
      • Lebbé C.
      • et al.
      New metabolic phenotypes in laminopathies: LMNA mutations in patients with severe metabolic syndrome.
      ,
      • Young J.
      • Morbois-Trabut L.
      • Couzinet B.
      • Lascols O.
      • Dion E.
      • Béréziat V.
      • Fève B.
      • Richard I.
      • Capeau J.
      • Chanson P.
      • Vigouroux C.
      Type A insulin resistance syndrome revealing a novel lamin a mutation.
      ,
      • Bertrand A.T.
      • Chikhaoui K.
      • Yaou R.B.
      • Bonne G.
      Clinical and genetic heterogeneity in laminopathies.
      Finally, genetic heterogeneity in phenotypically ascertained cohorts further complicates the ability to draw genotype-phenotype conclusions without a substantial number of participants.
      • Rylaarsdam L.
      • Guemez-Gamboa A.
      Genetic causes and modifiers of autism spectrum disorder.
      Reverse phenotyping mitigates these biases by performing phenotyping after a group of participants has been selected based on genotype alone (ideally from a phenotypically unselected cohort). As our examples have shown, genomic ascertainment can provide a more comprehensive picture of the phenotypic consequences of a genetic variant compared to that of phenotypic ascertainment.
      In clinical informatics research, clinical data that were previously collected (most commonly in the EHR as part of routine care) can be associated with genomic data to discover and test genotype-phenotype associations.
      • Abul-Husn N.S.
      • Kenny E.E.
      Personalized medicine and the power of electronic health records.
      ,
      • Pendergrass S.A.
      • Crawford D.C.
      Using electronic health records to generate phenotypes for research.
      ,
      • Sudlow C.
      • Gallacher J.
      • Allen N.
      • Beral V.
      • Burton P.
      • Danesh J.
      • Downey P.
      • Elliott P.
      • Green J.
      • Landray M.
      • et al.
      UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.
      ,
      • Gottesman O.
      • Kuivaniemi H.
      • Tromp G.
      • Faucett W.A.
      • Li R.
      • Manolio T.A.
      • Sanderson S.C.
      • Kannry J.
      • Zinberg R.
      • Basford M.A.
      • et al.
      The electronic medical records and genomics (eMERGE) network: past, present, and future.
      Mining EHR data generally reduces the time, energy, and resources required for data collection compared to research methods that require participant recruitment and prospective data collection.
      • Kirby J.C.
      • Speltz P.
      • Rasmussen L.V.
      • Basford M.
      • Gottesman O.
      • Peissig P.L.
      • Pacheco J.A.
      • Tromp G.
      • Pathak J.
      • Carrell D.S.
      • et al.
      PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.
      ,
      • Wei W.Q.
      • Denny J.C.
      Extracting research-quality phenotypes from electronic health records to support precision medicine.
      Similar to reverse phenotyping research, clinical informatics research can utilize genomic ascertainment. Although this can be an effective way to investigate phenotypes commonly assessed and recorded in the EHR, it is not practical for rare traits that are infrequently or inaccurately assessed or for which the phenotyping is prohibitively expensive for a large cohort (Figure 1). For example, our reverse phenotyping study of GCKR variants required a 48- to 72-h inpatient evaluation of glucose homeostasis, which is affordable only in limited numbers.
      • Rees M.G.
      • Ng D.
      • Ruppert S.
      • Turner C.
      • Beer N.L.
      • Swift A.J.
      • Morken M.A.
      • Below J.E.
      • Blech I.
      • et al.
      NISC Comparative Sequencing Program
      Correlation of rare coding variants in the gene encoding human glucokinase regulatory protein with phenotypic, cellular, and kinetic outcomes.
      The nature of rare variant research is such that the potential number of research participants available is naturally limited.
      • Altshuler D.
      • Daly M.J.
      • Lander E.S.
      Genetic mapping in human disease.
      Consequently, deep phenotyping information from a small number of individuals (beyond what is present in the EHR) is crucial to maximizing power to detect an association of rare variants to quantifiable phenotypes (further review of power calculations for reverse phenotyping and rare variant research is thoroughly described in Corbin et al.).
      • Corbin L.J.
      • Tan V.Y.
      • Hughes D.A.
      • Wade K.H.
      • Paul D.S.
      • Tansey K.E.
      • Butcher F.
      • Dudbridge F.
      • Howson J.M.
      • Jallow M.W.
      • et al.
      Formalising recall by genotype as an efficient approach to detailed phenotyping and causal inference.
      Consistent with our reverse phenotyping examples, relying on EHR data limits the positive predictive value of genomic associations with rare phenotypes.
      • Kirby J.C.
      • Speltz P.
      • Rasmussen L.V.
      • Basford M.
      • Gottesman O.
      • Peissig P.L.
      • Pacheco J.A.
      • Tromp G.
      • Pathak J.
      • Carrell D.S.
      • et al.
      PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.
      Disparate documentation methods, phenotypes below the level of clinical significance, or a lack of specificity in EHR data classification structure can lead to incomplete datasets, which decreases the specificity for identifying certain conditions.
      • Chan K.S.
      • Fowles J.B.
      • Weiner J.P.
      Review: electronic health records and the reliability and validity of quality measures: a review of the literature.
      The uniformity of targeted data collection in reverse phenotyping research alleviates these concerns over inconsistent data collection. When a genomic variant is predicted to be deleterious by in silico modeling, or a genotype-disease association has been posited based on phenotypic ascertainment research, reverse phenotyping can provide more information about the pathogenicity of the variant in phenotypically unselected populations. In these instances, a documented negative examination in a targeted reverse phenotyping study is a more robust finding than is the absence of a positive evaluation in an EHR. There are several examples of targeted reverse phenotyping studies in which negative examinations identified reduced penetrance of a condition
      • Johnston J.J.
      • Rubinstein W.S.
      • Facio F.M.
      • Ng D.
      • Singh L.N.
      • Teer J.K.
      • Mullikin J.C.
      • Biesecker L.G.
      Secondary variants in individuals undergoing exome sequencing: screening of 572 individuals identifies high-penetrance mutations in cancer-susceptibility genes.
      ,
      • Ng D.
      • Johnston J.J.
      • Teer J.K.
      • Singh L.N.
      • Peller L.C.
      • Wynter J.S.
      • Lewis K.L.
      • Cooper D.N.
      • Stenson P.D.
      • Mullikin J.C.
      • Biesecker L.G.
      NIH Intramural Sequencing Center NISC Comparative Sequencing Program
      Interpreting secondary cardiac disease variants in an exome cohort.
      or helped to refine or refute a claimed genotype-disease association.
      • Vester A.
      • Velez-Ruiz G.
      • McLaughlin H.M.
      • Lupski J.R.
      • Talbot K.
      • Vance J.M.
      • Züchner S.
      • Roda R.H.
      • Fischbeck K.H.
      • et al.
      NISC Comparative Sequencing Program
      A loss-of-function variant in the human histidyl-tRNA synthetase (HARS) gene is neurotoxic in vivo.
      ,
      • Patton J.
      • Brewer C.
      • Chien W.
      • Johnston J.J.
      • Griffith A.J.
      • Biesecker L.G.
      A genotypic ascertainment approach to refute the association of MYO1A variants with non-syndromic deafness.
      ,
      • Lyons J.J.
      • Stotz S.C.
      • Chovanec J.
      • Liu Y.
      • Lewis K.L.
      • Nelson C.
      • DiMaggio T.
      • Jones N.
      • Stone K.D.
      • Sung H.
      • et al.
      A common haplotype containing functional CACNA1H variants is frequently coinherited with increased TPSAB1 copy number.
      ,
      • Garnai S.J.
      • Brinkmeier M.L.
      • Emery B.
      • Aleman T.S.
      • Pyle L.C.
      • Veleva-Rotse B.
      • Sisk R.A.
      • Rozsa F.W.
      • Ozel A.B.
      • Li J.Z.
      • et al.
      Variants in myelin regulatory factor (MYRF) cause autosomal dominant and syndromic nanophthalmos in humans and retinal degeneration in mice.
      Certain cohorts, such as the UK Biobank, have employed a hybrid model of clinical informatics research by including a baseline of EHR data supplemented by additional phenotyping thought to be widely useful to researchers.
      • Sudlow C.
      • Gallacher J.
      • Allen N.
      • Beral V.
      • Burton P.
      • Danesh J.
      • Downey P.
      • Elliott P.
      • Green J.
      • Landray M.
      • et al.
      UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.
      Some biobank models have certain limitations such as an inability to recontact participants or to share individual-level data with both external researchers and participants. However, other models, including the Geisinger MyCode Community Health Initiative
      • Carey D.J.
      • Fetterolf S.N.
      • Davis F.D.
      • Faucett W.A.
      • Kirchner H.L.
      • Mirshahi U.
      • Murray M.F.
      • Smelser D.T.
      • Gerhard G.S.
      • Ledbetter D.H.
      The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research.
      and the Icahn School of Medicine at Mount Sinai’s BioMe BioBank program (https://icahn.mssm.edu/research/ipm/programs/biome-biobank), pair access to EHR data with the opportunity to recontact participants for bespoke reverse phenotyping, as was performed in our studies. We argue that this type of approach provides a complement to a strictly clinical informatics approach. Each method is optimal for disorders that have distinct attributes of frequency, penetrance, phenotypic pleiotropy, sensitivity, and variable expressivity.

      Considerations for implementing reverse phenotyping research

      Although reverse phenotyping is a promising approach to understand how genomic alterations result in phenotypic diversity, there are some logistical challenges that must be addressed. Our experience highlights optimal strategies for implementing reverse phenotyping research (Table 1).
      Table 1Establishing a reverse phenotyping program
      Principle Action item
      Include broad data sharing and the ability to recontact participants in the genomic research informed consent process: consistently including these parameters increases the likelihood of capturing rare variants and maximizes opportunities for interested participants to partake in research. Make broad genomic data sharing and future research recontact the default across all institutional consent forms. Include an opt-out option for genomic sequencing participants who wish to decline.
      Establish long-term, trusting relationships with study participants: success depends on the participants’ willingness to return for studies that may not directly benefit them. (1) Send study updates via an annual participant-friendly newsletter. (2) Notify participants when results are published and explain the findings using layperson language.
      Generate and maintain institutional engagement and support for follow-up studies: establishing and maintaining a reverse phenotyping resource requires institutional commitment and material resources of staff, money, and time. To create and continue the excitement surrounding such a resource, share the importance and impact of your results for participants and the larger field of genetics research. Promote research findings and collaborations by regularly presenting at institutional and cross-departmental seminars. Offer reverse phenotyping services to complement phenotypic ascertainment research studies.
      Define project scope and the roles of stakeholders during planning stage: it is crucial to know who will do what and for what duration of time. Be specific about the vision for the minimal criteria of success (the floor) and the pinnacle benchmarks (the ceiling) of the project. Create a strategic plan at the outset of planning a reverse phenotyping resource, and secure written agreement for pledged commitments.
      Define what results will be returned, to whom, and by whom: lack of clarity around which study participants will receive results and who is responsible for delivery of these results can diminish participants’ trust and risk their participation in future studies. (1) Establish a clear plan for deciding what genomic results will be returned based on thresholds of clinical and personal utility. Secure the resources to obtain CLIA-valid results and genetic counseling services. (2) Establish a plan for return of secondary findings for participants in the genomic ascertainment cohort, including CLIA validation and genetic counseling services (if not already returned by individual sequencing cohorts).
      Invest in adding new cohorts through networking with other investigators: there is power in numbers, especially for rare variants, so it is crucial to have a large pool of participants from which to draw. Meet with regional institutions about pooling genomic data into one shared resource. Create a plan to mediate potential conflicts of research interests.
      Reverse phenotyping research requires broad sharing of exome- and genome-sequencing data to ensure that a sufficient number of participants with rare variants are available to adequately power phenotyping studies.
      • Corbin L.J.
      • Tan V.Y.
      • Hughes D.A.
      • Wade K.H.
      • Paul D.S.
      • Tansey K.E.
      • Butcher F.
      • Dudbridge F.
      • Howson J.M.
      • Jallow M.W.
      • et al.
      Formalising recall by genotype as an efficient approach to detailed phenotyping and causal inference.
      For this to be feasible, sequenced research participants must consent to sharing of deidentified genomic information. Consistent with our own experience, such a broad consent does not adversely affect study participation.
      • Sanderson S.C.
      • Brothers K.B.
      • Mercaldo N.D.
      • Clayton E.W.
      • Antommaria A.H.M.
      • Aufox S.A.
      • Brilliant M.H.
      • Campos D.
      • Carrell D.S.
      • Connolly J.
      • et al.
      Public attitudes toward consent and data sharing in biobank research: a large multi-site experimental survey in the US.
      Broad genomic data sharing is a priority in NIH-led genomics efforts and is becoming standard research practice.
      • Green E.D.
      • Gunter C.
      • Biesecker L.G.
      • Di Francesco V.
      • Easter C.L.
      • Feingold E.A.
      • Felsenfeld A.L.
      • Kaufman D.J.
      • Ostrander E.A.
      • Pavan W.J.
      • et al.
      Strategic vision for improving human health at The Forefront of Genomics.
      The Autism Sequencing Consortium and Autism Spectrum/Intellectual Disability Network is one example of how multiple research groups pooled genetic sequencing cohorts to increase phenotypic return in an instance where private and rare variants represent a high burden of disease.
      • Stessman H.A.
      • Bernier R.
      • Eichler E.E.
      A genotype-first approach to defining the subtypes of a complex disease.
      Such practices can be replicated in the phenotypically unselected cohorts needed for optimal genomic ascertainment research. Up until the last decade, these cohorts have largely been recruited from academic medical centers. Recent efforts by regional healthcare systems, such as Geisinger Health in central Pennsylvania, indicate that large sequencing cohorts with opportunities for follow-up research can be generated in other healthcare settings.
      • Carey D.J.
      • Fetterolf S.N.
      • Davis F.D.
      • Faucett W.A.
      • Kirchner H.L.
      • Mirshahi U.
      • Murray M.F.
      • Smelser D.T.
      • Gerhard G.S.
      • Ledbetter D.H.
      The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research.
      However, given the fractious state of the United States’ healthcare system and the reliance on private genetic testing laboratories for generating clinical sequencing data, a publicly funded, healthcare system-agnostic program such as All of Us may be the most feasible solution to scale this type of research on a national level.
      • Denny J.C.
      • Rutter J.L.
      • Goldstein D.B.
      • Philippakis A.
      • Smoller J.W.
      • Jenkins G.
      • Dishman E.
      All of Us Research Program Investigators
      The "All of Us" research program.
      In addition to the need for broad data sharing, the ability to recontact participants for follow-up research opportunities is fundamental to reverse phenotyping research. Upon initial enrollment in a sequencing study, participants should be given the opportunity to provide informed consent to be contacted for consideration of future research. Our experience is that participants are amenable to long-term relationships with research studies that expand the original research aims.
      • Biesecker L.G.
      • Mullikin J.C.
      • Facio F.M.
      • Turner C.
      • Cherukuri P.F.
      • Blakesley R.W.
      • Bouffard G.G.
      • Chines P.S.
      • Cruz P.
      • Hansen N.F.
      • et al.
      The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine.
      Participants are more willing to engage in these relationships when they are provided with an explanation of the scientific utility of their data.
      • Sanderson S.C.
      • Brothers K.B.
      • Mercaldo N.D.
      • Clayton E.W.
      • Antommaria A.H.M.
      • Aufox S.A.
      • Brilliant M.H.
      • Campos D.
      • Carrell D.S.
      • Connolly J.
      • et al.
      Public attitudes toward consent and data sharing in biobank research: a large multi-site experimental survey in the US.
      Therefore, it is crucial that investigators maintain active communication with study participants to build informed, long-term, and trusting relationships.
      Once a reverse phenotyping cohort has been established, operationalizing and implementing reverse phenotyping research requires intentional design. Our experiences with our reverse phenotyping program have taught us that it is important to elicit stakeholders’ visions, intentions, and desired outcomes at the outset of planning such a resource, then translate those desires into a concrete plan that defines roles, responsibilities, and accountability across the resource. To optimize use of this resource and to maximize chances of a robust reverse phenotyping research outcome, we suggest investigators submit a single-page application detailing the background and objectives of the reverse phenotyping study, the genomically ascertained variant(s) of interest, reverse phenotyping study design (including proposed clinical examinations, sample size, and power considerations), how the results will fit into a future publication, and whether results generated through the study (clinical or genomic) will be returned to participants. The reverse phenotyping resource executing these requests could take different shapes depending on the resources and staff available. At one end of the spectrum is a fully decentralized cohort-sharing consortium, in which genomic data are shared to facilitate participant discovery, but each research team is responsible for recontacting and coordinating clinical examinations of their own participants. Deidentified results or samples could then be shared with the requesting research team. At the other end of the spectrum is a full-service model such as the Reverse Phenotyping Core Facility at NHGRI, which has a dedicated staff to maintain the shared genomic data resource for multiple cohorts and also coordinates participant recontact, reverse phenotyping studies, return of clinically actionable results to participants, and delivery of study results to requesting investigators across multiple institutions.
      After identifying variants of interest for genomic ascertainment research, the participants harboring those variants must be recontacted and invited to participate in the reverse phenotyping study. This raises human participant research protection questions about when and how much information should be disclosed so the participant can make a truly informed decision and maximize the benefits while minimizing the harms of participation.
      United StatesNational Commission for the Protection of Human Subjects of Biomedical and Behavioral Research
      The Belmont Report : Ethical Principles and Guidelines for the Protection of Human Subjects of Research.
      Prior recall-by-genotype studies have reported that few, if any, participants express concerns about being recruited because of genomic variants as opposed to clinical characteristics.
      • Minion J.T.
      • Butcher F.
      • Timpson N.
      • Murtagh M.J.
      The ethics conundrum in Recall by Genotype (RbG) research: Perspectives from birth cohort participants.
      ,
      • Beskow L.M.
      • Namey E.E.
      • Cadigan R.J.
      • Brazg T.
      • Crouch J.
      • Henderson G.E.
      • Michie M.
      • Nelson D.K.
      • Tabor H.K.
      • Wilfond B.S.
      Research participants' perspectives on genotype-driven research recruitment.
      ,
      • Tabor H.K.
      • Brazg T.
      • Crouch J.
      • Namey E.E.
      • Fullerton S.M.
      • Beskow L.M.
      • Wilfond B.S.
      Parent perspectives on pediatric genetic research and implications for genotype-driven research recruitment.
      Participants favor being recontacted by a known entity in a stepwise manner (for example, via a written letter followed by phone call) so that they have time to prepare for the discussion.
      • Beskow L.M.
      • Fullerton S.M.
      • Namey E.E.
      • Nelson D.K.
      • Davis A.M.
      • Wilfond B.S.
      Recommendations for ethical approaches to genotype-driven research recruitment.
      ,
      • Wright M.F.
      • Lewis K.L.
      • Fisher T.C.
      • Hooker G.W.
      • Emanuel T.E.
      • Biesecker L.G.
      • Biesecker B.B.
      Preferences for results delivery from exome sequencing/genome sequencing.
      The degree to which participants should be informed of the reason underlying their genomic ascertainment depends on the specifics of the variant of interest and the research study. When the genetic variant in question has established clinical validity and utility, there is a clear imperative to validate the result (i.e., CLIA confirmation) and return it to the participant so they may use this information in their healthcare.
      • Ravitsky V.
      • Wilfond B.S.
      Disclosing individual genetic results to research participants.
      However, when the clinical validity and utility of a genetic variant is uncertain, opinions diverge on whether the genetic variant should be shared with the participant. Clinical validity and utility may not be the only appropriate threshold when deciding whether results should be returned to participants in this investigational setting.
      • Beskow L.M.
      • Fullerton S.M.
      • Namey E.E.
      • Nelson D.K.
      • Davis A.M.
      • Wilfond B.S.
      Recommendations for ethical approaches to genotype-driven research recruitment.
      ,
      • Ravitsky V.
      • Wilfond B.S.
      Disclosing individual genetic results to research participants.
      Other reasons to disclose the reason for genomic ascertainment include the ability to use the information to make an informed decision about participating in the reverse phenotyping study. The personal meaning a participant may ascribe to a result, including the impact on the participants’ interpersonal relationships or personal identity, are also valid reasons for disclosure.
      • Beskow L.M.
      • Namey E.E.
      • Cadigan R.J.
      • Brazg T.
      • Crouch J.
      • Henderson G.E.
      • Michie M.
      • Nelson D.K.
      • Tabor H.K.
      • Wilfond B.S.
      Research participants' perspectives on genotype-driven research recruitment.
      ,
      • Beskow L.M.
      • Burke W.
      Offering individual genetic research results: context matters.
      However, some genomic ascertainment participants acknowledge that they rarely consider genetic information when making decisions about whether to participate in a reverse phenotyping study.
      • Minion J.T.
      • Butcher F.
      • Timpson N.
      • Murtagh M.J.
      The ethics conundrum in Recall by Genotype (RbG) research: Perspectives from birth cohort participants.
      ,
      • Michie M.
      • Cadigan R.J.
      • Henderson G.
      • Beskow L.M.
      Am I a control?: genotype-driven research recruitment and self-understandings of study participants.
      The appetite to receive uncertain results can be less than that for actionable results, so it is important to proceed cautiously and reveal information incrementally to respect a potential participant’s desire to not learn unwanted genetic information.
      • Beskow L.M.
      • Fullerton S.M.
      • Namey E.E.
      • Nelson D.K.
      • Davis A.M.
      • Wilfond B.S.
      Recommendations for ethical approaches to genotype-driven research recruitment.
      ,
      • Facio F.M.
      • Eidem H.
      • Fisher T.
      • Brooks S.
      • Linn A.
      • Kaphingst K.A.
      • Biesecker L.G.
      • Biesecker B.B.
      Intentions to receive individual results from whole-genome sequencing among participants in the ClinSeq study.
      ,
      • Beskow L.M.
      Genotype-driven recruitment and the disclosure of individual research results.
      However, it is important to recognize that genomic ascertainment researchers are in privileged positions with regard to accessing this information, and sharing this information with participants may be an empowering gesture acknowledging participants’ autonomy and research contribution.
      • Tabor H.K.
      • Brazg T.
      • Crouch J.
      • Namey E.E.
      • Fullerton S.M.
      • Beskow L.M.
      • Wilfond B.S.
      Parent perspectives on pediatric genetic research and implications for genotype-driven research recruitment.
      ,
      • Beskow L.M.
      • Burke W.
      Offering individual genetic research results: context matters.
      ,
      • Doernberg S.
      • Hull S.C.
      Harms of deception in FMR1 premutation genotype-driven recruitment.
      All ascertainment modes have inherent limitations. Reverse phenotyping mitigates some of the important biases of phenotypic ascertainment, but not all. One important consideration is that the composition of the genomic ascertainment cohort naturally limits the phenotypes that may be observed. For example, our super-cohort comprises predominantly healthy volunteers. Therefore, we would not expect to observe severe disease phenotypes unless such conditions had a later age of onset. Relying on participants to return to a clinical center for reverse phenotyping also limits our ability to recruit very ill participants. Alternatively, participants who are so subtly affected they believe they do not have a risk for the health condition under investigation may decline requests for post-hoc phenotyping. This study design is also somewhat limited in the ability to identify conditions with childhood lethality, although including newborn trios somewhat circumvents this concern. As nearly 70% of our super-cohort was recruited from newborn trios, we are also somewhat selected against heritable conditions that would cause infertility.
      The ideal reverse phenotyping cohort would include participants of all ages, all sex and gender attributes, all racial and ethnic backgrounds, and the full range of health and disease. Obviously, this is impossible. With respect to age and health, there are also important considerations and limitations with respect to the ethical propriety and/or practicality of subjecting young or seriously ill individuals to research-indicated phenotyping. As well, the ideal cohort attributes for this type of research depend on the specific nature of each scientific question that is being asked and therefore cannot be known in advance (at the time of cohort establishment). Because of these issues, the only recommendation that can be made is to strive to make such cohorts as large and diverse as is possible.

      Conclusion

      Reverse phenotyping is an approach to clinical genomics research that has the potential to transform how we understand the effects of genetic variation on human health and disease. Although clinical informatics approaches are well suited to answer questions about the penetrance and phenotypic variability of more commonly documented phenotypic attributes, targeted reverse phenotyping allows investigators to more deeply evaluate rare phenotypes, perform highly specific and consistent phenotypic assessments across a set of individuals with a variant, and interrogate rare variants in an economical and robust manner. When comparing reverse phenotyping to other modes of research (specifically, phenotypic ascertainment and clinical informatics research), the best approach is dependent on the specific hypothesis. Each approach has an important role in research, and ideally they are all used in a complementary fashion to improve our knowledge of the phenotypic consequence of human genetic variation. Genomic medicine offers the potential to identify future disease risk based on the presence of a variant in an individual, and to realize this potential we need clinical research that models the prediction of disease based on genotype alone. Using genomic ascertainment research to understand the implications of genomic variants in phenotypically unselected populations will be critical as we prepare to implement genomic screening for an increasing number of health conditions.
      • Murray M.F.
      • Evans J.P.
      • Angrist M.
      • Uhlmann W.R.
      • Lochner Doyle D.
      • Fullerton S.M.
      • Ganiats T.G.
      • Hagenkord J.
      • Imhof S.
      • Rim S.H.
      A Proposed Approach for Implementing Genomics-Based Screening Programs for Healthy Adults.
      In this regard, targeted reverse phenotyping is an essential tool to increase our ability to predict phenotype from genotype, understand the molecular taxonomy of disease, and eventually provide precision medicine.

      Acknowledgments

      We are grateful to our research participants, without whom this work would not be possible. We thank the investigators at NIH who have utilized reverse phenotyping in their research and the BioWulf high-performance computing cluster. We would especially like to thank Daniel MacArthur and the gnomAD team for generously allowing us to use the gnomAD browser site architecture. This work was supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. A.E.K. was supported by the National Institutes of Health Director Challenge Fund. L.G.B., A.E.K., and J.O. and the ClinSeq program are supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health grants HG200359-13 and HG200387-08. C.T. and C.M.W. are supported by the Intramural Research Program of the National Human Genome Research Institute grant HG200418-02. J.E.P., S.Z., S.S., and T.G.W. are supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (ZIC HG200345). M.S. is supported by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases. The content represents the views of the authors and does not necessarily represent the official views of the National Institutes of Health.

      Declaration of interests

      L.G.B. is a member of the Illumina Medical Ethics Board and receives honoraria from Cold Spring Harbor Laboratory Press, royalties from Invitae, Inc., and research support from Merck, Inc.

      Supplemental information

      References

        • Moresco E.M.Y.
        • Li X.
        • Beutler B.
        Going forward with genetics: recent technological advances and forward genetics in mice.
        Am. J. Pathol. 2013; 182: 1462-1473
        • Altshuler D.
        • Daly M.J.
        • Lander E.S.
        Genetic mapping in human disease.
        Science (New York, N.Y.). 2008; 322: 881-888
        • Nannenberg E.A.
        • van Rijsingen I.A.W.
        • van der Zwaag P.A.
        • van den Berg M.P.
        • van Tintelen J.P.
        • Tanck M.W.T.
        • Ackerman M.J.
        • Wilde A.A.M.
        • Christiaans I.
        Effect of ascertainment bias on estimates of patient mortality in inherited cardiac diseases.
        Circ. Genom. Precis. Med. 2018; 11: e001797
        • Ranola J.M.O.
        • Tsai G.J.
        • Shirts B.H.
        Exploring the effect of ascertainment bias on genetic studies that use clinical pedigrees.
        Eur. J. Hum. Genet. 2019; 27: 1800-1807
        • Brohet R.M.
        • Velthuizen M.E.
        • Hogervorst F.B.L.
        • Meijers-Heijboer H.E.J.
        • Seynaeve C.
        • Collée M.J.
        • Verhoef S.
        • Ausems M.G.E.M.
        • Hoogerbrugge N.
        • van Asperen C.J.
        • et al.
        Breast and ovarian cancer risks in a large series of clinically ascertained families with a high proportion of BRCA1 and BRCA2 Dutch founder mutations.
        J. Med. Genet. 2014; 51: 98-107
        • Decaudain A.
        • Vantyghem M.-C.
        • Guerci B.
        • Hécart A.C.
        • Auclair M.
        • Reznik Y.
        • Narbonne H.
        • Ducluzeau P.-H.
        • Donadille B.
        • Lebbé C.
        • et al.
        New metabolic phenotypes in laminopathies: LMNA mutations in patients with severe metabolic syndrome.
        J. Clin. Endocrinol. Metab. 2007; 92: 4835-4844
        • Young J.
        • Morbois-Trabut L.
        • Couzinet B.
        • Lascols O.
        • Dion E.
        • Béréziat V.
        • Fève B.
        • Richard I.
        • Capeau J.
        • Chanson P.
        • Vigouroux C.
        Type A insulin resistance syndrome revealing a novel lamin a mutation.
        Diabetes. 2005; 54: 1873-1878
        • Bertrand A.T.
        • Chikhaoui K.
        • Yaou R.B.
        • Bonne G.
        Clinical and genetic heterogeneity in laminopathies.
        Biochem. Soc. Trans. 2011; 39: 1687-1692
        • Schulze T.G.
        • McMahon F.J.
        Defining the phenotype in human genetic studies: forward genetics and reverse phenotyping.
        Hum. Hered. 2004; 58: 131-138
        • McGuire S.E.
        • McGuire A.L.
        Don't throw the baby out with the bathwater: enabling a bottom-up approach in genome-wide association studies.
        Genome Res. 2008; 18: 1683-1685
        • Biesecker L.G.
        • Mullikin J.C.
        • Facio F.M.
        • Turner C.
        • Cherukuri P.F.
        • Blakesley R.W.
        • Bouffard G.G.
        • Chines P.S.
        • Cruz P.
        • Hansen N.F.
        • et al.
        The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine.
        Genome Res. 2009; 19: 1665-1674
        • Akhrif A.
        • Roy A.
        • Peters K.
        • Lesch K.-P.
        • Romanos M.
        • Schmitt-Böhrer A.
        • Neufang S.
        REVERSE phenotyping—Can the phenotype following constitutive Tph2 gene inactivation in mice be transferred to children and adolescents with and without adhd? brain and behavior n/a.
        Brain Behav. 2021; 11: e02054
        • Bodian D.L.
        • Klein E.
        • Iyer R.K.
        • Wong W.S.W.
        • Kothiyal P.
        • Stauffer D.
        • Huddleston K.C.
        • Gaither A.D.
        • Remsburg I.
        • Khromykh A.
        • et al.
        Utility of whole-genome sequencing for detection of newborn screening disorders in a population cohort of 1, 696 neonates.
        Genet. Med. 2016; 18: 221-230
        • Hauser N.S.
        • Solomon B.D.
        • Vilboux T.
        • Khromykh A.
        • Baveja R.
        • Bodian D.L.
        Experience with genomic sequencing in pediatric patients with congenital cardiac defects in a large community hospital.
        Mol. Genet. Genomic Med. 2018; 6: 200-212
        • Similuk M.N.
        • Yan J.
        • Ghosh R.
        • Oler A.J.
        • Franco L.M.
        • Setzer M.R.
        • Kamen M.
        • Jodarski C.
        • DiMaggio T.
        • Davis J.
        • et al.
        Clinical exome sequencing of 1000 families with complex immune phenotypes: towards comprehensive genomic evaluations.
        J. Allergy Clin. Immunol. 2022; 150: 947-954
        • Van der Auwera G.A.
        • O'Connor B.D.
        Genomics in the Cloud: Using Docker, GATK, and WDL in Terra.
        O'Reilly Media, 2020
        • Poplin R.
        • Ruano-Rubio V.
        • DePristo M.A.
        • Fennell T.J.
        • Carneiro M.O.
        • Van der Auwera G.A.
        • Kling D.E.
        • Gauthier L.D.
        • Levy-Moonshine A.
        • Roazen D.
        • et al.
        Scaling accurate genetic variant discovery to tens of thousands of samples.
        bioRxiv. 2018; (Preprint at): 201178https://doi.org/10.1101/201178
        • Karczewski K.J.
        • Francioli L.C.
        • Tiao G.
        • Cummings B.B.
        • Alföldi J.
        • Wang Q.
        • Collins R.L.
        • Laricchia K.M.
        • Ganna A.
        • Birnbaum D.P.
        • et al.
        The mutational constraint spectrum quantified from variation in 141, 456 humans.
        Nature. 2020; 581: 434-443
        • Lyons J.J.
        • Yu X.
        • Hughes J.D.
        • Le Q.T.
        • Jamil A.
        • Bai Y.
        • Ho N.
        • Zhao M.
        • Liu Y.
        • O'Connell M.P.
        • et al.
        Elevated basal serum tryptase identifies a multisystem disorder associated with increased TPSAB1 copy number.
        Nat. Genet. 2016; 48: 1564-1569
        • Gregg A.R.
        • Warman A.W.
        • Thorburn D.R.
        • O'Brien W.E.
        Combined malonic and methylmalonic aciduria with normal malonyl-coenzyme A decarboxylase activity: a case supporting multiple aetiologies.
        J. Inherit. Metab. Dis. 1998; 21: 382-390
        • Sloan J.L.
        • Johnston J.J.
        • Manoli I.
        • Chandler R.J.
        • Krause C.
        • Carrillo-Carrasco N.
        • Chandrasekaran S.D.
        • Sysol J.R.
        • O'Brien K.
        • Hauser N.S.
        • et al.
        Exome sequencing identifies ACSF3 as a cause of combined malonic and methylmalonic aciduria.
        Nat. Genet. 2011; 43: 883-886
        • Manthiram K.
        • Preite S.
        • Dedeoglu F.
        • Demir S.
        • Ozen S.
        • Edwards K.M.
        • Lapidus S.
        • Katz A.E.
        • Feder Jr., H.M.
        • et al.
        • Genomic Ascertainment Cohort
        Common genetic susceptibility loci link PFAPA syndrome, Behçet’s disease, and recurrent aphthous stomatitis.
        Proc. Natl. Acad. Sci. USA. 2020; 117: 14405-14411
        • Rylaarsdam L.
        • Guemez-Gamboa A.
        Genetic causes and modifiers of autism spectrum disorder.
        Front. Cell. Neurosci. 2019; 13: 385
        • Abul-Husn N.S.
        • Kenny E.E.
        Personalized medicine and the power of electronic health records.
        Cell. 2019; 177: 58-69
        • Pendergrass S.A.
        • Crawford D.C.
        Using electronic health records to generate phenotypes for research.
        Curr. Protoc. Hum. Genet. 2019; 100: e80
        • Sudlow C.
        • Gallacher J.
        • Allen N.
        • Beral V.
        • Burton P.
        • Danesh J.
        • Downey P.
        • Elliott P.
        • Green J.
        • Landray M.
        • et al.
        UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.
        PLoS Med. 2015; 12: e1001779
        • Gottesman O.
        • Kuivaniemi H.
        • Tromp G.
        • Faucett W.A.
        • Li R.
        • Manolio T.A.
        • Sanderson S.C.
        • Kannry J.
        • Zinberg R.
        • Basford M.A.
        • et al.
        The electronic medical records and genomics (eMERGE) network: past, present, and future.
        Genet. Med. 2013; 15: 761-771
        • Kirby J.C.
        • Speltz P.
        • Rasmussen L.V.
        • Basford M.
        • Gottesman O.
        • Peissig P.L.
        • Pacheco J.A.
        • Tromp G.
        • Pathak J.
        • Carrell D.S.
        • et al.
        PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.
        J. Am. Med. Inform. Assoc. 2016; 23: 1046-1052
        • Wei W.Q.
        • Denny J.C.
        Extracting research-quality phenotypes from electronic health records to support precision medicine.
        Genome Med. 2015; 7: 41
        • Rees M.G.
        • Ng D.
        • Ruppert S.
        • Turner C.
        • Beer N.L.
        • Swift A.J.
        • Morken M.A.
        • Below J.E.
        • Blech I.
        • et al.
        • NISC Comparative Sequencing Program
        Correlation of rare coding variants in the gene encoding human glucokinase regulatory protein with phenotypic, cellular, and kinetic outcomes.
        J. Clin. Invest. 2012; 122: 205-217
        • Corbin L.J.
        • Tan V.Y.
        • Hughes D.A.
        • Wade K.H.
        • Paul D.S.
        • Tansey K.E.
        • Butcher F.
        • Dudbridge F.
        • Howson J.M.
        • Jallow M.W.
        • et al.
        Formalising recall by genotype as an efficient approach to detailed phenotyping and causal inference.
        Nat. Commun. 2018; 9: 711
        • Chan K.S.
        • Fowles J.B.
        • Weiner J.P.
        Review: electronic health records and the reliability and validity of quality measures: a review of the literature.
        Med. Care Res. Rev. 2010; 67: 503-527
        • Johnston J.J.
        • Rubinstein W.S.
        • Facio F.M.
        • Ng D.
        • Singh L.N.
        • Teer J.K.
        • Mullikin J.C.
        • Biesecker L.G.
        Secondary variants in individuals undergoing exome sequencing: screening of 572 individuals identifies high-penetrance mutations in cancer-susceptibility genes.
        Am. J. Hum. Genet. 2012; 91: 97-108
        • Ng D.
        • Johnston J.J.
        • Teer J.K.
        • Singh L.N.
        • Peller L.C.
        • Wynter J.S.
        • Lewis K.L.
        • Cooper D.N.
        • Stenson P.D.
        • Mullikin J.C.
        • Biesecker L.G.
        • NIH Intramural Sequencing Center NISC Comparative Sequencing Program
        Interpreting secondary cardiac disease variants in an exome cohort.
        Circ. Cardiovasc. Genet. 2013; 6: 337-346
        • Vester A.
        • Velez-Ruiz G.
        • McLaughlin H.M.
        • Lupski J.R.
        • Talbot K.
        • Vance J.M.
        • Züchner S.
        • Roda R.H.
        • Fischbeck K.H.
        • et al.
        • NISC Comparative Sequencing Program
        A loss-of-function variant in the human histidyl-tRNA synthetase (HARS) gene is neurotoxic in vivo.
        Hum. Mutat. 2013; 34: 191-199
        • Patton J.
        • Brewer C.
        • Chien W.
        • Johnston J.J.
        • Griffith A.J.
        • Biesecker L.G.
        A genotypic ascertainment approach to refute the association of MYO1A variants with non-syndromic deafness.
        Eur. J. Hum. Genet. 2017; 25: 147-149
        • Lyons J.J.
        • Stotz S.C.
        • Chovanec J.
        • Liu Y.
        • Lewis K.L.
        • Nelson C.
        • DiMaggio T.
        • Jones N.
        • Stone K.D.
        • Sung H.
        • et al.
        A common haplotype containing functional CACNA1H variants is frequently coinherited with increased TPSAB1 copy number.
        Genet. Med. 2018; 20: 503-512
        • Garnai S.J.
        • Brinkmeier M.L.
        • Emery B.
        • Aleman T.S.
        • Pyle L.C.
        • Veleva-Rotse B.
        • Sisk R.A.
        • Rozsa F.W.
        • Ozel A.B.
        • Li J.Z.
        • et al.
        Variants in myelin regulatory factor (MYRF) cause autosomal dominant and syndromic nanophthalmos in humans and retinal degeneration in mice.
        PLoS Genet. 2019; 15: e1008130
        • Carey D.J.
        • Fetterolf S.N.
        • Davis F.D.
        • Faucett W.A.
        • Kirchner H.L.
        • Mirshahi U.
        • Murray M.F.
        • Smelser D.T.
        • Gerhard G.S.
        • Ledbetter D.H.
        The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research.
        Genet. Med. 2016; 18: 906-913
        • Sanderson S.C.
        • Brothers K.B.
        • Mercaldo N.D.
        • Clayton E.W.
        • Antommaria A.H.M.
        • Aufox S.A.
        • Brilliant M.H.
        • Campos D.
        • Carrell D.S.
        • Connolly J.
        • et al.
        Public attitudes toward consent and data sharing in biobank research: a large multi-site experimental survey in the US.
        Am. J. Hum. Genet. 2017; 100: 414-427
        • Green E.D.
        • Gunter C.
        • Biesecker L.G.
        • Di Francesco V.
        • Easter C.L.
        • Feingold E.A.
        • Felsenfeld A.L.
        • Kaufman D.J.
        • Ostrander E.A.
        • Pavan W.J.
        • et al.
        Strategic vision for improving human health at The Forefront of Genomics.
        Nature. 2020; 586: 683-692
        • Stessman H.A.
        • Bernier R.
        • Eichler E.E.
        A genotype-first approach to defining the subtypes of a complex disease.
        Cell. 2014; 156: 872-877
        • Denny J.C.
        • Rutter J.L.
        • Goldstein D.B.
        • Philippakis A.
        • Smoller J.W.
        • Jenkins G.
        • Dishman E.
        • All of Us Research Program Investigators
        The "All of Us" research program.
        N. Engl. J. Med. 2019; 381: 668-676
        • United States
        • National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research
        The Belmont Report : Ethical Principles and Guidelines for the Protection of Human Subjects of Research.
        National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1978
        • Minion J.T.
        • Butcher F.
        • Timpson N.
        • Murtagh M.J.
        The ethics conundrum in Recall by Genotype (RbG) research: Perspectives from birth cohort participants.
        PLoS One. 2018; 13: e0202502
        • Beskow L.M.
        • Namey E.E.
        • Cadigan R.J.
        • Brazg T.
        • Crouch J.
        • Henderson G.E.
        • Michie M.
        • Nelson D.K.
        • Tabor H.K.
        • Wilfond B.S.
        Research participants' perspectives on genotype-driven research recruitment.
        J. Empir. Res. Hum. Res. Ethics. 2011; 6: 3-20
        • Tabor H.K.
        • Brazg T.
        • Crouch J.
        • Namey E.E.
        • Fullerton S.M.
        • Beskow L.M.
        • Wilfond B.S.
        Parent perspectives on pediatric genetic research and implications for genotype-driven research recruitment.
        J. Empir. Res. Hum. Res. Ethics. 2011; 6: 41-52
        • Beskow L.M.
        • Fullerton S.M.
        • Namey E.E.
        • Nelson D.K.
        • Davis A.M.
        • Wilfond B.S.
        Recommendations for ethical approaches to genotype-driven research recruitment.
        Hum. Genet. 2012; 131: 1423-1431
        • Wright M.F.
        • Lewis K.L.
        • Fisher T.C.
        • Hooker G.W.
        • Emanuel T.E.
        • Biesecker L.G.
        • Biesecker B.B.
        Preferences for results delivery from exome sequencing/genome sequencing.
        Genet. Med. 2014; 16: 442-447
        • Ravitsky V.
        • Wilfond B.S.
        Disclosing individual genetic results to research participants.
        Am. J. Bioeth. 2006; 6: 8-17
        • Beskow L.M.
        • Burke W.
        Offering individual genetic research results: context matters.
        Sci. Transl. Med. 2010; 2: 38cm20
        • Michie M.
        • Cadigan R.J.
        • Henderson G.
        • Beskow L.M.
        Am I a control?: genotype-driven research recruitment and self-understandings of study participants.
        Genet. Med. 2012; 14: 983-989
        • Facio F.M.
        • Eidem H.
        • Fisher T.
        • Brooks S.
        • Linn A.
        • Kaphingst K.A.
        • Biesecker L.G.
        • Biesecker B.B.
        Intentions to receive individual results from whole-genome sequencing among participants in the ClinSeq study.
        Eur. J. Hum. Genet. 2013; 21: 261-265
        • Beskow L.M.
        Genotype-driven recruitment and the disclosure of individual research results.
        Am. J. Bioeth. 2017; 17: 64-65
        • Doernberg S.
        • Hull S.C.
        Harms of deception in FMR1 premutation genotype-driven recruitment.
        Am. J. Bioeth. 2017; 17: 62-63
        • Murray M.F.
        • Evans J.P.
        • Angrist M.
        • Uhlmann W.R.
        • Lochner Doyle D.
        • Fullerton S.M.
        • Ganiats T.G.
        • Hagenkord J.
        • Imhof S.
        • Rim S.H.
        A Proposed Approach for Implementing Genomics-Based Screening Programs for Healthy Adults.
        NAM Perspectives, 2019