Origin of the different strains.
We used 11 reference strains. Strain 774 comes from the Centraalbureau voor Schimmelcultures (Delft, The Netherlands), along with the reference CBS 138
T; strains 918, 949, 957, 960, 962, 965, 970, and 971 were provided by the Institut Pasteur (Paris, France) with reference numbers IP 3, 811, 13, 20, 14, 12, 32, and 9, respectively, and strains 1109 and 1110 came from SANOFI (Montpellier, France) with reference numbers SANOFI 2223 and 2231. Information concerning the hospital strains (anatomic location, geographical origin, antifungal resistance, prophylaxy, and HIV
+ versus HIV
− status) is presented in Table
1. All strains were identified by the ID32 C (bioMérieux SA, Marcy l'Etoile, France) protocol. These strains were isolated, one per person, from 52 patients.
Enzymatic and RAPD protocols.
Enzymatic extracts were obtained as previously described (
28). Starch gel electrophoresis and enzymatic assays were performed as described previously (
27,
33). Activity data were obtained for the following 26 enzymes: aldolase (EC 4.1.2.13), alpha-glyceraldehyde dehydrogenase (EC 1.1.1.8), creatine kinase (EC 2.7.3.2), diaphorase (EC 1.6.4.3), alcohol dehydrogenase (EC 1.1.1.1), esterase (EC 3.1.1.1), fructokinase (EC 2.7.1.4), fumarase (EC 4.2.1.2), glucose phosphate isomerase (EC 5.3.1.9), glucose-6-phosphate dehydrogenase (EC 1.1.1.49), isocitrate dehydrogenase (1.1.1.42), leucine aminopeptidase (EC 3.4.11), lactate dehydrogenase (EC 1.1.1.27), malic enzyme (EC 1.1.1.40), mannose-6-phosphate isomerase (EC 5.3.1.8), purine nucleoside phosphorylase (EC 2.4.2.1), peptidase A (EC 3.4.13, substrate Val-Leu), peptidase B (EC 3.4.13, substrate Leu-Gly-Gly), peptidase C (EC 3.4.13, substrate Lys-Leu), peptidase D (EC 3.4.13 substrate Phe-Pro), phosphoglucomutase (EC 2.7.5.1), glyceraldehyde phosphate dehydrogenase (EC 1.2.1.12), octopine dehydrogenase (EC 1.5.1.11), acid phosphatase (EC 3.1.3.2), sorbitol dehydrogenase (EC 1.1.1.14), and superoxide dismutase (EC 1.15.1.1.).
Creatine kinase, fructokinase, leucine aminopeptidase, mannose-6-phosphate isomerase, peptidase 3, phosphoglucomutase, and superoxide dismutase enzymatic activities were each expressed by two loci. Thus, data were obtained for 33 genetic loci. Alleles were numbered according to their anodal mobility.
To test for the possible correlation between two independent sets of genetic markers (MLEE and RAPD), a subset of 20 of the available strains was selected. These strains were chosen in order to examine the existing enzymatic variability (see below). Each strain was cultured in two vials containing Sabouraud agar (Difco) for 48 h at 27°C. Cultures were then suspended and ground in a cell homogenizer under CO
2 monitored cold conditions (dry ice). DNA was extracted according to a standard protocol (
19). Forty primers belonging to the E and F kits (Operon Technologies, Inc., Alameda, Calif.) were tested. A band was considered polymorphic if it is present (amplified) in at least two strains and not all strains. Among these primers we retained 24 bands (assumed to be loci) that were reproductive (one individual always displayed the same profile after each PCR). These 24 bands were obtained by using 14 primers (E4, E6, E14, E17, E18, F1, F2, F4, F6, F10, F12, F13, F14, and F16). Thus, one primer provides one or more bands that are each interpreted as a locus with two alleles (band present = 1, band absent = 0). We used this information for the comparative analysis between MLEE and RAPD data.
Data analysis.
The genetic distance used to build a dendrogram linking the different strains of
C. glabrata was the Cavalli-Sforza and Edwards (
4) chord distance matrix, which is the most appropriate for tree construction (
38). The distances were computed by the GENETIX v.4 software package (Laboratoire Génome et Populations, CNRS UPR 9060, Université de Montpellier II, Montpellier, France). The distance matrix obtained was then used to build a dendrogram (neighbor-joining method) (
35) computed by the software NJTree v2.0 of the RESTSITE v1.1 package (
24). This dendrogram provided a visual picture to illustrate the genetic structure of our sample.
Linkage disequilibrium between loci gives a clue regarding the reproductive regime of the population under investigation.
Nonrandom association between each pair of loci was tested by the exact test for genotypic disequilibrium provided by the software GENEPOP v3.2.a (
30). For each locus pair, an unbiased estimate of the
P value of the probability test (or Fisher exact test) was operated by the Markov chain method (
30). The total number of iterations (randomization) was set to 10
6. The probability of each randomized table is computed, and the
P value is calculated as the sum of the probabilities of all tables (with marginal values the same as the observed one) with a probability lower than or equal to that of the observed table.
Additionally, the level of significance for nonrandom association between multilocus repeated genotypes was tested by the combinatorial probability of sampling a given genotype as often as or more often than that actually observed (
d1) (
41). A multilocus standardized index of linkage disequilibrium
IsA was also computed by using LIAN 3.0 software (
12), which tests the null hypothesis of no linkage by a Monte Carlo simulation (10,000 permutations) on the variance of genetic distances between isolates (
VD) (
12,
13).
The significance of genetic differentiation between strains obtained at the two different hospitals, between HIV
+ and HIV
− patients, or between strains found in the respiratory tract and those obtained in the anal or urogenital spheres was tested by the
G-based exact test for population differentiation (
11). The test is performed after 15,000 permutations of genotypes between samples by the software F-Stat 2.9 (
http://www.unil.ch/izea/softwares/fstat.html ) (
9). For each permutation, the log-likelihood statistic (
G) is computed, at each locus, from the allelic contingency table among populations. The
P value corresponds to the proportion of randomized
G that was above or equal to the
G of the observed data. This software was also used to compute
Fst (a standardized measure of genetic differentiation) unbiased estimates (
42) for each locus and over all loci. The
Fst value varies between 0 (no differentiation) and 1 (all samples fixed for a different allele).
Nei's 1972 genetic distance (
21) was used for the study of correlation between distance matrices because it best estimates the branch length for electrophoretic data (
38). The distances were computed by the GENETIX v.4 software package.
Allelic frequencies cannot be computed for RAPD markers. For these markers, the measure of genetic distances between strains was computed with Nei and Li's distance finding (
23). The RAPDistance Package of Armstrong and coworkers (available from the Research School of Biological Sciences, Canberra, ACT 2601, Australia [
http://life.anu.edu.au/molecular/software/RAPDistance ]) was used. The correlation between the two half matrices obtained (MLEE and RAPD) was then tested by a Mantel test (
20), which is appropriate for matrix comparisons. A mixture of individuals from differentiated population may generate linkage disequilibria (Wahlund effect). In order to control for the possible influence of geographical distances, Mantel tests were also made between the matrix of geographical distances coded 0 (local), 1 (Paris-Delft), 2 (Paris-Montpellier), and 3 (Montpellier-Delft) and genetic distances (MLEE or RAPD). Furthermore, each genetic distance matrix (MLEE and RAPD) was regressed against these geographical distances, and the residuals were kept. Theoretically, the residuals represent the part of the variance not explained by geographical distances. The matrix of these residuals was used for an additional Mantel test, which hopefully corrected for geographical influences. Mantel tests were performed by using GENEPOP v3.2.a.
Since multiple testing enhances type I error, we applied the sequential Bonferroni procedure when required (
32). For example, if one kind of test is repeated 100 times on a population that fulfills the null hypothesis, the definition of statistical inferences predict that five of these tests will be significant at the 5% level. A technique to avoid this caveat is the sequential Bonferroni procedure, where the desired significant level (say, α) is divided by the number of remaining tests. Thus, for
n tests, the lowest
P value (among the
n available) is compared to the corrected level α/
n, the second lowest
P value is compared to α/(
n − 1), etc. The sequential Bonferroni significant
P values will then be the
i + 1 ones that stay below the corresponding corrected significant level, α/(
n −
i). As a complement, the proportion of tests significant at the 5% level was compared to the expected 0.05 proportion by an exact binomial test performed by using S-Plus 2000 (Professional Release 2; MathSoft, Inc.). This alternative procedure allows testing of whether the proportion of significant tests is equal to or below the 5% expected under the null hypothesis (at the 5% level of significance). This approach may be useful for procedures involving many tests that are not very powerful individually (small sample sizes) as in linkage disequilibrium testing between pairs of loci. In such cases, indeed, the sequential Bonferroni procedure may be too conservative.
The power of the tests was evaluated by running simulations of asexual clonal haploids with the software EASYPOP v1.6 (IZEA; Lausanne University, Lausanne, Switzerland [
http://www.unil.ch/izea/softwares/easypop.html ]). The computer simulation was performed with an island model (
47) of 100 subpopulations of 100 haploid individuals each, with a migration rate of 0.04. The 33 loci (as in our data) displayed a random mutation rate of 10
−5 into five possible allelic states. All of these parameters were found after a trial-and-error process involving several simulations with different parameter sets until we observed equilibrium values similar to those observed in our real samples in terms of the mean number of polymorphic loci (13.8 versus 13 in the real samples), unbiased heterozygosity (0.088 versus 0.11), and differentiation between samples (
Fst = 0.12 versus 0.11). Each population began in a monomorphic state with a strict clonal mode of reproduction. The simulation ran for 10,000 generations (sufficient to reach a stable equilibrium between drift, migration, and mutation), after which 30 samples of 22 and 30 individuals (from Paris and Montpellier, respectively) were randomly sampled from 2 of the 100 subpopulations. These simulations provided a null hypothesis for a case of 100% clonality. In other words, this supplied 30 data sets equivalent to the
C. glabrata samples we disposed of but with a 100% clonal mode of reproduction. Such data sets allowed us to test the power of detection of linkage disequilibria in sample sizes of 22 and 30 isolates drawn from a strictly asexual species in order to compared them to what we observed in
C. glabrata isolates from Paris and Montpellier, where the reproductive mode is unknown.