Journal list menu

Volume 5, Issue 1 p. 58-69
RESEARCH ARTICLE
Open Access

Development of single nucleotide polymorphism markers and genetic diversity in guava (Psidium guajava L.)

Luis Diaz-Garcia

Corresponding Author

Luis Diaz-Garcia

Campo Experimental Pabellon, Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias (INIFAP), Aguascalientes, Mexico

Correspondence

Luis Diaz-Garcia, Campo Experimental Pabellon, Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias (INIFAP), Aguascalientes 20676, Mexico.

Email: [email protected]

Search for more papers by this author
José S. Padilla-Ramírez

José S. Padilla-Ramírez

Campo Experimental Pabellon, Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias (INIFAP), Aguascalientes, Mexico

Search for more papers by this author
First published: 18 July 2022
Citations: 2

Funding information: This work was supported by INIFAP through project 16203834876.

Abstract

Societal impact statement

There are many understudied fruits of great economic and productive importance for regional agricultural hubs. Guava is an American fruit species with attractive nutritional and adaptability characteristics. However, in many countries like Mexico, its cultivation is still limited to small-to-medium size plantings and home gardens and depends on poorly controlled germplasm that diminishes productivity and expansion. Our study provides valuable insight to better understand guava diversity and generate high-yielding, high-quality, and better adapted materials. Supporting the study and development of understudied crops will provide us with more resources for facing climate change adversities and for diversifying cropping systems.

Summary

  • Guava (Psidium guajava L.) is a fruit crop species native to tropical and subtropical regions of America with great productive and economic potential due to its extensive environmental adaptability, nutritional value, and medicinal properties. However, the lack of molecular resources for accelerated breeding, limited knowledge about its evolutionary and domestication history, and unfavorable policies have limited its genetic improvement and broader adoption as a commercial fruit crop.
  • Here, we present the first diversity study in guava employing genome-wide single nucleotide polymorphisms (SNPs). Forty-eight accessions collected from Mexico and other continents were examined with more than 6000 high-quality SNP markers, which represents a marker density increase of ~30x compared with previous studies. Relationships between genetic groupings and geographic origins were not apparent in this diverse guava collection using principal component and structure analyses.
  • Extensive germplasm exchange among guava-producing regions and limited varietal control at commercial plantations might have contributed to ambiguities when defining the true origin and identities of existing germplasm materials. However, an analysis of domestication syndrome traits (fruit size and sugar) by wild or improved germplasm revealed several putative genomic regions under selection.
  • Knowledge about germplasm origins and genetic relatedness, in conjunction with reliable molecular resources and better agronomic practices, is necessary to support breeding efforts and facilitate broader adoption of orphan crops such as guava, which have increasingly important roles in light of climate change adversities and in diversifying diets and food systems.

1 INTRODUCTION

Guava (Psidium guajava L., Myrtaceae; 2n = 2x = 22) is a fruit crop species native to tropical and subtropical regions of Mexico, Central and South America (Arévalo-Marín et al., 2021). Its evolutionary and domestication history is poorly understood, and several hypotheses regarding guava's origin are still under debate (Aranguren et al., 2008; Arévalo-Marín et al., 2021; Hastorf, 2006; Landrum, 2021; Nakasone & Paull, 1998; Pearsall, 2008; Risterucci et al., 2005). Important guava producers in the world include Mexico, Brazil, China, Egypt, and India, with more than 2 million tons per year (Altendorf, 2018). Guava has great productive and economic potential due to its extensive adaptability (Fischer & Melgarejo, 2021), nutritional value, and medicinal properties (Pérez Gutiérrez et al., 2008). Moreover, it has high concentrations of vitamins A, B1, and B2 and minerals such as calcium, magnesium, and potassium, among others. Particularly, guava fruit has a higher vitamin C content (up to 400 mg/100 g dry weight) compared with other major fruit crops such as oranges (Rajan & Hudedamani, 2019). Despite guava's great potential, unfavorable conditions have limited the genetic improvement and broader adoption of this crop. For example, worldwide commercialization (e.g., exportation to Europe and the United States), which could potentially increase revenue, has been marginal due to unmet phytosanitary regulations, the short shelf life for fresh fruit consumption, and the lack of marketing, among others. Moreover, limited varietal control and low availability of elite germplasm with improved fruit quality characteristics (e.g., better flavor, less seed content, and fruit size) and disease and pest resistance have also negatively impacted guava's production and commercialization at both local and global levels (Carabalí-Muñoz et al., 2021; Padilla-Ramirez et al., 2012).

The genus Psidium is of American origin, although some species such as Psidium guineense, Psidium cattleyanum, and P. guajava can be found as weedy plants in many regions of the world (Landrum, 2021). Psidium is a monophyletic group (Flickinger et al., 2020; Lucas et al., 2007; Vasconcelos et al., 2017) with at least 60 species (Landrum, 2021). Three Psidium species complexes have been proposed based on morphological, ecological, and geographical characteristics (Landrum, 2003, 2005, 2021). Psidium guava is part of the P. guajava complex, together with P. guineense (closest relative to P. guajava; Salywon, 2003), Psidium guyanense, Psidium nutans, Psidium rostratum, and Psidium rutidocarpum. Based on recent archeological and species richness evidence, it is believed that guava ancestor originated in the savannas and semi-deciduous forests of South America during the Miocene followed by a geographic expansion during the Holocene by humans and local megafauna (Arévalo-Marín et al., 2021; Clement, 1989; Iriarte et al., 2020; Landrum, 2017; Neves & Heckenberger, 2019; Piperno, 2011).

The lack of molecular resources has limited genetic diversity studies in guava. Only a handful of diversity surveys have been conducted using low-density marker panels, and these studies have generally been unable to reconstruct perceived associations between genetic, phenotypic diversity, and ecogeographic origin in Mexican (Sánchez-Teyer et al., 2010), Brazilian (Fernandes-Santos et al., 2010), and Pakistani guava germplasm (Mehmood et al., 2014). Few comprehensive studies have explored the global genetic diversity and differentiation of guava germplasm, although it undoubtedly exists (Mehmood et al., 2014). These global diversity studies have been constrained due to the use of unbalanced germplasm collections (i.e., collections largely dominated by a single geographic origin) and scarce and unreliable molecular markers (i.e., RAPDs). Genomic- and marker-assisted selection (MAS) methodologies could enable accelerated guava breeding, particularly for disease resistance and fruit postharvest quality traits that could allow for expansion of commercial guava production. Some quantitative trait loci (QTL) mapping studies have been conducted on guava for fruit and leaf morphology, fruit quality, and vegetative traits (Padmakar et al., 2016; Ritter et al., 2010; Valdés-Infante et al., 2003). However, the implementation of MAS strategies will ultimately require the validation of the marker–trait associations through more reliable and denser genetic marker panels, sampling more diverse genetic backgrounds, and more comprehensible and reliable phenotyping.

The recent publication of the draft and chromosome-level genome assemblies of guava (Feng et al., 2021; Thakur et al., 2021), the draft genome of Psidium friedrichsthalianum (Rojas-Gómez et al., 2022), and the wider availability of genomic profiling methods should allow accelerated progress in understanding guava genetic diversity, history, and genetic architecture of phenotypic variation. Previous studies have shown little to the null association between genetic diversity and ecogeography in Mexican germplasm; however, it is still unclear to what extent this lack of association is maintained when genome-wide information and denser marker sets are utilized. Moreover, considering the recent publication of the guava genome, it is a great opportunity to investigate which genomic regions could have been targeted during the domestication history of the species. Here, we present the first genetic diversity study on guava germplasm using genome-wide molecular markers generated through genotyping-by-sequencing (GBS; Elshire et al., 2011). Through a combination of complementary and highly informative approaches, we aimed to develop single nucleotide polymorphism (SNP) marker information to further explore the population structure and genetic diversity of a well-represented collection of Mexican guava and the putative genomic regions associated with phenotypic selection during the domestication and genetic improvement of the crop. To put the Mexican guava collection under examination in context, we included additional samples from seven other countries where guava is commonly grown. Finally, we discuss the ambiguity of defining the nature (wild, domesticated, or improved germplasm) and true geographic origin of guava germplasm and the potential for this information to improve broader adaptation, productivity, and applicability of guava in global food systems within the context of emerging climate adversities.

2 MATERIALS AND METHODS

2.1 Plant material

In 2007 and 2008, the Institute for Forestry, Agriculture and Livestock Research (INIFAP) carried out an extensive collection of guava sampling germplasm from commercial plantations, home gardens, and native environments in 11 of the 32 states in Mexico (Padilla-Ramírez & González-Gaona, 2010). A germplasm collection was established at INIFAP's research station in Zacatecas, Mexico (21°44.7′N, 102°58.0′W, 1508 masl) using either cuttings or seeds. The climate type for this location, according to García (2004), is BS1(h’)w, or warm semiarid, with maximum, minimum, and mean annual temperatures of 31.0°C, 9.7°C, and 20.3°C, respectively. Precipitation is between 500 and 550 mm per year, with most of the rain happening between July and September. The soil is characterized by an alkaline pH (8.2), with inorganic nitrogen and organic matter of 28.9 ppm and 1.1%, respectively; potassium and calcium content is high, with values over 2300 and 5800 ppm, respectively (Padilla-Ramírez et al., 2014). Materials in the plantation were arranged in rows separated by 3 m, with a 1.5 m distance between trees. Agronomic practices were as described in Padilla-Ramírez and González-Gaona (2010). Forty-eight materials (Dataset S1) were used in this study to explore the genetic diversity of guava, which included 35 accessions collected in Mexico and 8 from other parts of the world, including Bolivia (n = 1), Brasil (2), Colombia (1), Cuba (1), Honduras (1), India (1), and South Africa (1). The other five remaining accessions of unknown origin were included to maximize phenotypic and genetic diversity in the study. Three of the five accessions with an unknown origin were wild species related to guava, Feijoa spp. (also known as Acca spp.), P. friedrichsthalianum (reports of diploids, tetraploids, and hexaploids; Rojas-Gómez et al., 2020), and Psidium cattleianum (2n = 4x = 44; Souza et al., 2015). The ploidy of these three species was not confirmed. The germplasm collection studied here exhibited substantial variation in fruit size, pulp and skin color, shape, seed content, and metabolites (see below for more details; Figure 1).

Details are in the caption following the image
Examples of the fruit variability in the guava germplasm examined in this study. Sample name and origin are shown in the top of each photograph; labels below each fruit are for internal control purposes

2.2 Genotyping

Genotyping was performed using GBS (Elshire et al., 2011) following Diaz-Garcia et al. (2020). DNA was extracted from young and healthy leaf tissue at LANGEBIO, Mexico, using an in-house protocol. Extracted DNA was digested with BglII and DdeI restriction enzymes and sequenced with Illumina (San Diego, CA, USA) NextSeq at LANGEBIO. GBS reads were processed in Tassel 5 GBS v2 Pipeline (Bradbury et al., 2007) using the recently published guava genome assembly as reference for SNP calling (Feng et al., 2021); bowtie2 (Langmead & Salzberg, 2012) was used to align filtered reads (“tags”) against reference genome, and only uniquely mapping reads were kept using samtools (Li et al., 2009). The variant calling file (VCF) was imported into R for further processing and analysis using custom scripts. SNP markers with <5X and >1000X coverage were removed. Moreover, markers with excessive missing data (>10%), minor allele frequency (MAF) < 5%, and with more than two alleles, were also discarded.

2.3 Genetic diversity and selection

Guava genetic diversity and population structure were assessed with complementary methodologies. Genetic relationships among accessions were investigated through principal component analysis (PCA) in R. Similarly, an analysis of population structure was conducted using a Bayesian approach with R package LEA (Frichot & François, 2015), where different numbers of clusters (K) were tested based on a cross-entropy criterion. A phylogenetic tree was constructed for the wild species using Euclidean genetic distances and clustering; the dendrogram was then visualized using R.

To identify putative selective sweeps, a whole-genome scan was performed using the cross-population composite likelihood ratio test or XP-CLR (Chen et al., 2010). Putative selective sweeps were evaluated by comparing wild/native and improved germplasm sets, which are further described in Section 9. A sliding window of 1 Mb and steps of 5 Kb were used for XP-CLR scanning. Because no genetic map is available for guava, genetic positions were approximated by multiplying physical positions by 10e-10, which assumes uniform recombination between markers. The top 5% highest XP-CLR values were considered as putatively selected regions, as in Liu et al. (2020). In addition, FST estimates were computed across the genome with the same germplasm groups using the pegas package (Paradis, 2010). XP-CLR and FST estimates were visualized in R with the ggplot2 package (Wickham, 2011).

2.4 Phenotypic evaluation

The germplasm collection was evaluated during the 2017 and 2018 seasons for 13 phenotypic traits, including fruit weight, pericarp weight (after removing seeds), polar diameter, equatorial diameter, sepal size, normalized sepal size (sepal size/equatorial diameter), titratable acidity, Brix, number of seeds, total and individual seed weight, pericarp-to-total seed weight ratio, and aspect ratio (polar/equatorial diameter). Harvest and fruit phenotyping took place during September, October, and November. Measurements were conducted on five fruits per accession (to ensure all materials had a consistent number of fruits) in an unreplicated design. Best linear unbiased estimates (BLUEs) were calculated with a linear model in R; only genotype and year effects were included in the model.

3 RESULTS

3.1 SNP discovery and genotyping in guava

A collection of 48 Psidium accessions, including 45 P. guajava and 3 wild species (Feijoa spp., P. friedrichsthalianum, and P. cattleianum) collected from Mexico and other parts of the world, were genetically characterized using GBS (Figure 2a,b). An average of 2.98 million filtered reads were generated for each accession, resulting in 19,533 unfiltered, unique SNPs (average read depth = 34.80). The genome assembly for the “New Age” cultivar (Feng et al., 2021), widely grown in China, was used as a reference for calling SNPs. Overall, 78.23% of the GBS tags aligned to the reference (58.35% aligned one time and 19.85% aligned >1 time; only uniquely mapping reads were used for SNP calling), which might indicate a recent divergence between American and Asian germplasm. For comparison purposes, the GBS tags were also mapped against the related P. friedrichsthalianum genome (Rojas-Gómez et al., 2022), a diploid Costa Rican native species (~1 Gb), with only a 37.74% mapping rate. Although this low mapping rate might be due to genome misassembly, it also may suggest a closer relationship between American and Asian P. guajava than P. guajava and P. friedrichsthalianum. After filtering sites by coverage (considering SNPs only with >5x and <1000x depth), missing data (<10%), MAF (>5%), and biallelic, only 6712 SNP markers remained for further examination. As expected, on this marker set (which considers all 48 accessions), all three wild species were filtered out, likely because of the genetic dissimilarity between species (Datasets S2 and S3). To further analyze genetic diversity in the wild species, the marker filtering (without applying MAF > 5% because of the small sample size, n = 3) was repeated separately, which resulted in 1155 markers. These 1155 markers included both polymorphic (157) and non-polymorphic markers (998) within the three wild species. However, the 998 non-polymorphic sites in the wild species were polymorphic in the guajava accessions and were useful to assess the relationship between guajava and wild species. From the 1155 markers found across Psidium species, only 297 (25.71%) were also included in the 6712 filtered markers found for the P. guajava samples (Figure 2c). The relationship between Feijoa spp., P. friedrichsthalianum, P. cattleianum, and P. guajava was explored through a phylogenetic tree using the set of 157 markers that were polymorphic in the wild species (Figure 2d); all three species within the Psidium genus were grouped as in previous studies (Flickinger et al., 2020; Vasconcelos et al., 2017). In general, markers were well distributed across the chromosomes in both P. guajava (one SNP per 65.37 Kb; Figure 2e) and wild species (one SNP per 378.57 Kb; Figure 2f).

Details are in the caption following the image
Genomic characterization of a diverse collection of guava (Psidium guajava) and three wild species (Feijoa spp., Psidium friedrichsthalianum, and Psidium cattleianum). (a,b) Geographic distribution of 43 accessions with known origin. For some accessions, the altitude (meters above sea level) of the collection site is colored. (c) Summary of the markers discovered in P. guajava and the wild species. (d) Phylogenetic tree based on 157 common markers in P. guajava and its wild relatives. Marker density along the genome for markers in P. guajava (e) and wild species (f) using a 1 Megabase (Mb) bin size

3.2 Classification of native and improved P. guajava germplasm

In many crop germplasm collections, there is a priori knowledge of the improvement status of the accessions—wild, landrace, and improved/elite (Cao et al., 2019; Diaz-Garcia et al., 2020; Iorizzo et al., 2013; Jeong et al., 2019; Julca et al., 2020; Li et al., 2020; Wu et al., 2018). Guava has a long evolutionary and domestication history; however, the improvement status of much of the cultivated germplasm and germplasm within collections remains unknown or ambiguously assigned. Consequently, the relationships between improved/cultivated and wild/native P. guajava germplasm are poorly understood (Arévalo-Marín et al., 2021). Standard domestication syndrome traits for fruit species—fruit size and sugar/acid balance—can be used to assign an improvement status (e.g., wild vs. cultivated) (Denham et al., 2020). However, guava trees in natural habitats or home gardens of unknown pedigree or origin often have fruit quality and morphological characteristics comparable with improved materials, further complicating the classification of guava germplasm.

The germplasm collection studied here included accessions collected from commercial plantations, natural habitats, and native settings or home gardens (see Dataset S1 for more information). Instead of using this passport information to classify accessions, two groups were formed based on two uncorrelated domestication syndrome traits (Dataset S4), sugar content (Brix) and size (polar diameter; Figure 3a). The two groups of accessions, called herein (1) poor quality (PQ) and (2) good quality (GQ), were defined as follows. The PQ group accessions were defined as those with polar diameter < 58 mm and Brix below a regression line with y-intercept = 14 and slope = −0.05 (Figure 3b). The rationale for the non-zero slope threshold was that early selection efforts in guava might have targeted fruit size first, independently of fruit sweetness as has been observed in other fruit crops (Cao et al., 2019; Guo et al., 2019; Liao et al., 2021). The GQ germplasm was defined as accessions with polar diameter > 58 mm and Brix above the y-intercept = 14 and slope = −0.05 regression line. Consequently, the PQ germplasm had smaller and less sweet fruit, whereas GQ germplasm had larger and sweeter fruit. The number of accessions in the GQ and PQ groups was 15 and 11, respectively. A previous report (Jiménez-Lozano et al., 2009) found that fruit polar diameter on exclusively wild guava germplasm (n = 22) was between 25.8 and 63.8 mm, which is consistent with the threshold used here. In addition, the same study also described Brix to range between 2.4% and 12.4%, which, in general, is the same interval occupied by the PQ group described here. Based on our classification, fruit and pericarp weight, equatorial diameter, and pericarp-to-seed ratio clearly separated PQ from GQ germplasm in the same magnitude and direction as the polar diameter and Brix domestication syndrome traits (Figure 3c). Other traits such as seed number, total seed weight, pericarp-to-seed ratio, and aspect ratio moderately separated both germplasm groups in the same direction (GQ germplasm had greater values). Conversely, sepal size, normalized sepal size, and titratable acidity separated PQ and GQ germplasm in the opposite direction (the PQ group had greater values). The PQ and GQ groups might be associated with wild/native and domesticated/improved germplasm, respectively (as in the same way other diversity studies name wild and domesticated germplasm), particularly because of the overlapping of the fruit quality and domestication domains. Importantly, we are aware of the influence of seasonality on many agronomic traits in fruit crops; therefore, more replication might be required to estimate more accurate genotypic values.

Details are in the caption following the image
Phenotypic variability in guava germplasm. (a) Genetic correlation between fruit quality traits in 45 Psidium guajava accessions. In the upper triangle, Pearson's correlations are shown. The distribution for each trait is shown in the diagonal. In the lower triangle, correlation plots and regression lines are displayed. Red and blue were added to highlight positive and negative correlations. (b) Genetic relationship between polar diameter and Brix, and the definition of poor-quality (PQ) and good-quality (GQ) germplasm (horizontal and vertical red lines); standard deviations are shown as vertical and horizontal gray lines for each accession. (c) Variability in PQ (n = 11) and GQ (n = 15) germplasm sets. Trait abbreviations, which are also described in panel (a), are the following: Ac, acid; AR, aspect ratio; Bx, Brix; ED, equatorial diameter; FW, fruit weight; NSS, normalized sepal size; PD, polar diameter; PSR, pericarp-to-seed ratio; PW, pericarp weight; SN, seed number; SS, sepal size; SW, seed weight; and TSW, total seed weight. Symbols ** and * correspond to significant differences at p < .005 and p < .05, respectively, based on an F test

3.3 Genetic diversity and population structure of guava

PCA using all 6712 SNPs in the P. guajava accessions revealed three partially defined groups composed of 6, 8, and 29 accessions—henceforth, Groups A1, A2, and A3, respectively (Figure 4a). Two accessions were not clearly associated with the three groups. Groups A1 and A2 were separated from Group A3 through PC1, whereas all groups were separated through PC2. Additionally, Group A3 represented a large portion of the variation explained by PC1 (i.e., many accessions scattered through PC1). Group A2, which included eight accessions, was composed mostly of GQ germplasm, except for one unclassified accession (however, its polar diameter and Brix were close to thresholds used to define PQ and GQ groups). Both A1 and A3 had a mix of GQ, PQ, and unclassified accessions; however, A3 included a larger number of PQ accessions. Although PC3 explained 6.6% of the variation, this axis did not help identify any additional genetic groups (Figure 4b). PCA of phenotypic data (13 quantitative traits) did not show clear clustering of accessions; however, a gradient through PC1 (70.7% of the variance) and PC2 (28.8% of the variance), with larger fruit in one extreme and smaller fruit in the other, was observed (Figure 4c).

Details are in the caption following the image
Genetic diversity and population structure on Psidium guajava accessions. (a,b) Principal component analysis (PCA) on 45 P. guajava accessions based on 6712 single nucleotide polymorphism (SNP) markers. (c) PCA on 13 phenotypic traits. (d) Admixture in P. guajava groups resolved at different K values; purple, green, and orange squares show unclassified, good-quality, and poor-quality germplasm groups, respectively; below each admixture barplot, proportions by group are shown as boxplots. (e) Cross-entropy values for different K values

A similar grouping pattern was observed in the structure analysis. Accessions in the GQ and PQ germplasm sets had different admixture profiles (Figure 4d). For K = 2, each germplasm group had a different dominant ancestry, as illustrated in the boxplot of Figure 4d; unclassified materials showed a similar admixture pattern to PQ germplasm. With larger K values (3 and 4), additional groups were observed that did not completely agree with the proposed classification (GQ vs. PQ) based on the phenotypic evaluation. For example, based on K = 4, three admixture patterns appeared: (1) accessions with a clear dominant ancestry (>90%), (2) accessions with a partially dominant ancestry or mixed proportions of the other three ancestries, and (3) accessions with equally mixed proportions of two, three, or four ancestries. Based on a cross-entropy criterion (Figure 4e), the most likely number of groups was four, which might be consistent with Groups A1, A2, and A3 (considering A3 members were scattered through PC1, this group might be split in two) identified through PCA, and that some level resembles the differentiation between PQ and GQ.

3.4 Correspondence between geographic origin, phenotypes, and genetic differentiation

Previous molecular studies of Mexican guava germplasm have reported the lack of association between genetic grouping and geographical origin, mostly when including only local germplasm (Hernández-Delgado et al., 2007; Sánchez-Teyer et al., 2010). Likewise, associations between geographic origin and phenotypic diversity in Brazilian germplasm have not been observed (Fernandes-Santos et al., 2010). Only a few studies have included accessions from different countries, in which case, there are clearer relationships between genetic diversity and geographic origin (Mehmood et al., 2016). One of the problems of working with local, understudied germplasm is the lack of confidence regarding the true origin (e.g., an accession of unknown origin might be introduced to a new location) and nature (e.g., wild, native, and improved) of collected germplasm, which might obscure the true relationship between genetic differentiation and geography.

In our study, the genetic and phenotypic variation between accessions was not clearly associated with geographic origin (i.e., latitude or longitude; Figure 5a,b). Accessions from Nayarit and Sinaloa, two coastal states in the Pacific with similar weather patterns, were separated from the rest of the Mexican accessions, which might suggest a latitudinal gradient of diversity across Mexico; however, larger sample sizes and more intense collection through central Mexico are required to validate this finding. By analyzing groups of accessions based on the state they were collected, molecular markers discriminated slightly better (Figure 5c) compared with phenotypic grouping (Figure 5d); however, PC1–3 using phenotypic data explained close to 100% of the variation compared with only 37% for the molecular markers.

Details are in the caption following the image
Relationship between genetic diversity and geographic origin. Geographic origin of 40 guava accessions and its relationship with (a) genetic diversity and (b) 13 phenotypic variables; different point colors and sizes denote variation through principal components (PC) 1 and 2, respectively. PC1–3 scores as a function of origin (Mexican states ordered from east to west), based on (c) molecular markers and (d) phenotypic data

3.5 Candidate selective sweeps for fruit size and Brix

By comparing PQ and GQ germplasm sets, 11 potential XP-CLR selective signals were identified (Figure 6a). XP-CLR significant regions were located in the first seven chromosomes, with sizes ranging from 10 kb to 1.69 Mb (one out of the 11 had just one significant marker; therefore, its size was 0), and a total length of 4.96 Mb (~1% of the genome size). FST values were also estimated; however, considering that 21/22 of the significant regions did not overlap with any of the XP-CLR signals, the FST analysis was not further discussed. XP-CLR significant regions harbored 117 genes with varying functions. Annotation through Gene Ontology Functional Enrichment Annotation Tool (GO FEAT) (Araujo et al., 2018) identified protein products or gene ontology (GO) terms for 130 genes; 56 genes were reported as uncharacterized proteins or with no associated GO term. Protein functions and GO terms were diverse; therefore, further studies with larger population sizes are required to propose functional categories or gene families under selection. Linkage disequilibrium (LD) decay distance varied considerably between GQ and PQ germplasm (Figure 6b); for PQ germplasm, LD decay to r2 = .25 occurred over approximately twice the distance required to achieve the same LD decay in GQ germplasm. LD50, which measures the LD decay distance when the r2 drops to half its maximum value, was 1.29 Mb for PQ and 1.21 Mb for GQ. Larger sample sizes for both groups might be required to estimate more accurate LD decay patterns.

Details are in the caption following the image
Putative selective sweeps and linkage disequilibrium (LD) in wild/native versus improved germplasm. (a) For the cross-population composite likelihood ratio test (XP-CLR; in green), the threshold was computed based on the 99% percentile of the empirical distribution. For the fixation index (FST; in yellow), the threshold was computed based on a 1000-permutation test resampling accession and group pairs. Significant peaks were merged when separated by less than 1 Megabase (Mb). (b) LD decay in wild/native, improved, and all guava germplasm; LD decay distances to r2 = .25 are shown with dashed lines

Because of the small sample size of this germplasm collection, genome-wide association analysis was not performed. Several QTLs for fruit weight, Brix, and other fruit quality characteristics have been reported previously (Padmakar et al., 2016; Ritter et al., 2010; Valdés-Infante et al., 2003). Comparing fruit size and metabolite QTL locations with genomic regions under selection might help to link putative selective sweeps and phenotypes under selection during guava domestication/improvement. However, transferring QTL locations from old linkage maps into the new guava genome assembly was impossible because no information was available regarding the physical identity (chromosome) of the linkage groups (i.e., no correspondence between linkage group and chromosome numbering) and the lack of marker sequences for anchoring points in common.

4 DISCUSSION

Understanding the evolutionary history, domestication, and diversification of crop species can support breeding and conservation efforts currently conducted by researchers and breeders. Compared with previous studies in Mexican, Brazilian, and Pakistani guava germplasm, our genetic diversity survey employed considerably more genetic markers (>30x). Denser marker sets provide a more precise picture of the genetic diversity of guava and the genomic regions potentially associated with domestication to study and target ongoing genetic improvement. Through a combination of complementary and highly informative approaches, we demonstrated that few relationships between local geographical origin and genetic clustering are present in Mexican guava, likely due to historical germplasm interchange between regions. Previous work (Arévalo-Marín et al., 2021), which exhaustively reviewed natural and anthropological evidence of guava in the Neotropics, suggested that guava domestication occurred in southwestern Amazonia, with a subsequent expansion by both humans and megafauna as early as the pre-Columbian times. Furthermore, several studies have emphasized indigenous people's food use as a main dispersal agent not only for guava but also for other fruited species, which ultimately expanded their natural distribution beyond its putative original range (Arévalo-Marín et al., 2021; Clement, 1999; Pfordt et al., 2020). The most recent introduction of domestic animals from Europeans and the transfer and exchange of guava fruits in post-Columbian times provided a large-scale venue for guava expansion (Arévalo-Marín et al., 2021; Janzen & Martin, 1982) and, in part, might explain the lack of association between genetic diversity and geographical origin observed here.

The small sample size studied here could explain the lack of correlation between geographic origin and genetic clustering, although it is unlikely given the unstructured nature of the germplasm under examination. Some studies examining larger germplasm collections composed of breeding materials, commercial lines, landraces, native, or wild accessions often find strong clustering among accessions mimicking geographic origin or group membership (Serba et al., 2019; Shi et al., 2017). However, sometimes this differentiation emerges from using highly structured datasets with subgroups composed of numerous, closely related accessions (e.g., breeding lines) and unbalanced sample sets. Currently, more genotyping is being conducted on additional samples in the INIFAP collection to further evaluate guava genetic diversity.

The extensive exchange of guava germplasm in Mexico, the Americas, and across the globe—combined with the lack of varietal control in commercial plantings—continues to be a barrier to establishing comprehensive, effectively diverse, and well-documented germplasm banks. The true nature of guava diversity and its association with geography and ecology remain obscure. The SNP discovery and diversity analyses conducted herein represent progress towards identifying selection pressure mechanisms and patterns of germplasm exchange driving genetic variation (Figure 2). However, the interpretability of the results under a broader context is still limited. The establishment of international collaboration networks to study larger germplasm sets, such as the ones maintained by the U.S. National Plant Germplasm System and the Tropical Agriculture Research and Higher Education Center (CATIE), might help to disentangle guava's evolutionary and domestication history and provide the molecular resources to facilitate accelerated breeding.

Many regions in the world are suitable for guava production. In America, guava's cultivated distribution extends from northern Mexico to Argentina. Around the world, other countries including India, China, and Egypt have favorable environments and have shown great potential for expanded guava production. Because of its medicinal properties, nutritional value, flavor, and adaptability, improved guava materials could have a large impact in local regions, where guava has a vital role in the food security and livelihood of resource-poor farmers (Tadele, 2019). Furthermore, through the generation of new knowledge and improved germplasm, favorable policies, education, and adoption campaigns, orphan, minor, or/and understudied crops might contribute to global food and nutrition security (Mabhaudhi et al., 2019). However, the development and adoption of modern breeding approaches are critical to accelerating current efforts in guava breeding (e.g., MAS and identification of wild relatives suitable for trait introgression) and addressing major bottlenecks limiting the availability, productivity, and adoption of elite germplasm.

ACKNOWLEDGMENTS

We thank Brandon Schlautman and Juan Zalapa for their critical reading of the manuscript and useful discussions.

    CONFLICT OF INTEREST

    The authors declare no conflict of interest.

    AUTHOR CONTRIBUTIONS

    LDG conceived the idea. LDG and JSPR conceptualized and supervised the project. LDG analyzed the data and drafted the manuscript. JSPR conducted germplasm collection and phenotypic evaluation. LDG and JSPR revised and approved the final manuscript.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are available in the supporting information of this article.