<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-KCV32QR" height="0" width="0" style="display:none;visibility:hidden">

Genomewide analysis of Drosophila GAGA factor target genes reveals context-dependent DNA binding

February 24, 2003
100 (5) 2580-2585

Abstract

The association of sequence-specific DNA-binding factors with their cognate target sequences in vivo depends on the local molecular context, yet this context is poorly understood. To address this issue, we have performed genomewide mapping of in vivo target genes of Drosophila GAGA factor (GAF). The resulting list of ≈250 target genes indicates that GAF regulates many cellular pathways. We applied unbiased motif-based regression analysis to identify the sequence context that determines GAF binding. Our results confirm that GAF selectively associates with (GA)n repeat elements in vivo. GAF binding occurs in upstream regulatory regions, but less in downstream regions. Surprisingly, GAF binds abundantly to introns but is virtually absent from exons, even though the density of (GA)n is roughly the same. Intron binding occurs equally frequently in last introns compared with first introns, suggesting that GAF may not only regulate transcription initiation, but possibly also elongation. We provide evidence for cooperative binding of GAF to closely spaced (GA)n elements and explain the lack of GAF binding to exons by the absence of such closely spaced GA repeats. Our approach for revealing determinants of context-dependent DNA binding will be applicable to many other transcription factors.
Transcription factors control gene expression patterns by binding to specific sequence elements in regulatory regions of the genome. The sequence specificity of a transcription factor is often inferred from in vitro experiments, but in vitro specificity is an unreliable predictor of in vivo binding (reviewed in ref. 1). In many cases, in vivo association of a transcription factor with its consensus sequence is strongly influenced by the presence or absence of other factors and by the local chromatin structure. This interplay between local molecular context and the binding of a transcription factor is still poorly understood. Here, we describe how a combination of large-scale mapping of target genes and bioinformatics approaches can reveal several aspects of the molecular context that determine the target specificity of a DNA-binding protein.
Drosophila GAGA factor (GAF) is a sequence-specific DNA-binding factor with several functions. Mutations in the GAF-encoding Trl gene affect viability and display distinct developmental phenotypes (2). GAF is involved in both gene activation (36) and gene repression (7, 8) and plays a role in the modulation of chromatin structure (4, 9) and mitotic chromosome segregation (10).
In vitro, GAF binds to the sequence (GA)n, with optimal binding requiring at least 2.5 GA repeats (11, 12). In agreement with this, the solution structure of the DNA-binding domain of GAF complexed to a GAGAG-containing DNA element shows contact sites of the protein with all five base pairs (11). Mutational analysis has confirmed that (GA)n motifs are necessary for GAF antirepressor activity (13). Studies of GAF binding in the native chromatin context have identified a number of in vivo GAF target sequences that indeed contain (GA)n elements (1416). Taken together, these data strongly argue that GAF binds to (GA)n sequences.
In the sequenced portion of the Drosophila genome, GAGAG elements occur on average once every 652 bp (data not shown), which would predict that virtually every gene has several molecules of GAF bound in its immediate vicinity. However, staining of larval salivary gland polytene chromosomes with GAF-specific antibodies shows a clear banded pattern (13, 17). Thus, GAF is unlikely to bind to every GAGAG element in the genome. The observation that GAF binds to the AAGAG satellite repeat only during mitosis (18) further suggests that GAF binding can be modulated by local molecular features, the nature of which is unknown.
Here, we report the large-scale identification of in vivo GAF target loci in the Drosophila genome. We measured binding of GAF to thousands of loci by using the recently described “chromatin profiling” approach (16). We expressed a fusion protein consisting of GAF linked to Dam methyltransferase in Drosophila Kc cells and subsequently used a DNA microarray-based method to detect the resulting GAF-directed adenine methylation pattern. When corrected for the methylation pattern obtained with untethered Dam, this GAF-directed methylation pattern reflects the in vivo binding pattern of GAF (16, 19).
We previously reported that methylation by tethered Dam spreads in cis over 2–5 kb from a protein binding sequence (19). On the one hand, this limits the mapping resolution of the chromatin profiling technique to a few kb. On the other hand, it allows for the use of conventional cDNA arrays to detect binding of proteins to upstream and downstream regulatory sequences, provided that the binding sites are located within the methylation spreading distance from transcribed regions. As we demonstrate below, unbiased bioinformatics analysis of such binding profiles can be used to uncover some of the rules that govern context-dependent binding of transcription factors.

Materials and Methods

Chromatin Profiling Experiments.

Chromatin profiling of GAF was performed as described (16) by using spotted microarrays containing the Drosophila Gene Collection (release 1) (20) and 430 additional cDNA and genomic fragments. All measured ratios were log2-transformed and normalized to the median value of the entire array. Data from three independent experiments (one with reversed dye orientation) were averaged. A total of 331 cDNA and genomic DNA fragments that were spotted in duplicate on the arrays showed a high correlation between the two spots (r = 0.97; mean difference between the two spots 0.06 ± 0.07), further confirming the accuracy of our measurements. To test whether log ratios were significantly different from 0, we used the cyber-t algorithm (21), followed by a correction for multiple testing (22), setting the estimated false discovery rate to 0.05.

Construction of Sequence Files.

EST and genomic sequences were obtained from the Berkeley Drosophila Genome Project (BDGP), release 2. For 5,459 ESTs we were able to identify unique matching genomic regions (megablast against the BDGP database). For each of these, the precise chromosomal coordinates of the 5′ and 3′ boundaries of the matching region were determined. reduce and GAGAG spacing analyses were restricted to microarray data obtained from 4,402 ESTs that matched to genomic regions <10 kb in size for which at least 10-kb upstream and downstream flanking sequence could be obtained. Coordinates of introns, exons, and nontranscribed sequences were obtained from BDGP genome annotation files. Perl scripts that were written for this purpose are available on request.

reduce Analysis and Analysis of GAGAG Spacing.

The sequences of the probed loci (optionally including flanking sequence on both sides, as well as the sequence of the introns, exons, or intergenic regions they contain were determined by using the Berkeley Drosophila Genome Project whole-genome sequence and annotation (GFF) files (release 2) and dedicated Perl scripts. reduce analysis was performed as described (23) by using software available at http://bussemaker.bio.columbia.edu/reduce (see also Supporting Text, which is published as supporting information on the PNAS web site, www.pnas.org). In the linear model underlying reduce, each occurrence of a given motif is assumed to contribute equally to the localization of GAF at a given locus; the coefficients in the model are determined by performing a least-squares fit to the chromatin profiling log ratios. All motifs from a large class are scored based on how much the fit to the data would improve by their inclusion in the model.

Results

Identification of ≈250 Target Loci of GAF.

Previously, we reported the use of the DamID chromatin profiling approach to screen ≈300 Drosophila genes for GAF binding (16). Here, we extended this approach to >6,000 genes. Briefly, a fusion protein consisting of Dam methyltransferase and the 519-aa isoform (17) of GAF was expressed in Drosophila Kc cells. This leads to preferential methylation of GAF binding sites in the genome. Methylated genomic DNA fragments were purified, fluorescently labeled, and used to probe microarrays containing 6,280 unique cDNA fragments. Methylated DNA purified from cells expressing unfused Dam was labeled with a different fluorochrome and used as a reference probe. Normalized fluorescence ratios represent the targeted/untargeted methylation ratios, and therefore the relative GAF binding to the probed loci (16).
Because we used cDNA microarrays in our assay, we were only able to directly measure methylation levels in exons in the probed loci. However, targeted methylation “spreads” in cis over ≈2–5 kb (19). Binding of GAF-Dam to upstream, downstream, and intronic sequences may therefore be detected as increased methylation of the nearby exon sequences, provided that the GAF binding sites are located within a few kilobases from a probed exon. We define a “target gene” as a gene for which the corresponding cDNA probe on the array detects a significantly elevated GAF-Dam/Dam methylation ratio (Fig. 1, Table 3, which is published as supporting information on the PNAS web site, and Materials and Methods).
Figure 1
Distribution of GAF-Dam/Dam methylation ratios (log2 values; average of three independent experiments). To illustrate the skewed distribution toward positive values, a Gaussian distribution (drawn curve) was fitted to values below the mode of the histogram (gray bars). Black bars show counts of statistically significant GAF target loci (see Materials and Methods).
At an estimated false discovery rate of 0.05, we identified 262 cDNA probes for which GAF-targeted methylation levels were significantly elevated (Table 4, which is published as supporting information on the PNAS web site). Of these, 219 probes corresponded to 208 unique previously annotated genes and three repetitive elements (some genes were represented by two cDNAs). The remaining 43 positive cDNAs matched genomic loci that had not been annotated. Importantly, the GAF binding patterns are strikingly different from the binding patterns of six other Drosophila proteins [HP1, HP1c, Su(var)3–9, dMyc, dMad, and an ortholog of mammalian Max; B.v.S., F. Greil, J.D., A. Orian, and R. Eisenman, unpublished work], supporting the specificity of the chromatin profiling technique.
The identified GAF target genes appear to cover a broad variety of functions and include genes that encode proteins involved in growth and development, signaling, heat shock response, and metabolic pathways (Table 1). Thus, GAF may regulate a wide range of cellular processes and pathways. Our data confirm the previously reported GAF binding to heat shock protein genes (14), but not the reported very weak binding to the 28S rDNA and histone gene loci (14). Although for other individual loci in our list of target genes the binding of GAF may need to be confirmed by independent methods such as chromatin immunoprecipitation (14), global analysis of our GAF binding patterns as presented below strongly supports the specificity of our mapping technique.
Table 1
Gene functional categories with three or more GAF targets
Gene Ontology category GAF target genes in category
Biological processes   
 Amino acid metabolism SelD, Gad1, slgA
 Catabolism Gad1, eff, slgA
 Cell communication PGRP-LA, Dad, grk
 Cell growth and/or maintenance Fer1HCH, Cdk4, stg, PGRP-LA, LamC, SelD, Gad1, eff, ash2, Alhambra, slgA, chic
 Defense response PGRP-LA, CG3829, spz, Myd88
 Developmental processes mfas, grk, fray
 Heat shock response Hsp22, Hsp26, Hsp27
 Imaginal disc growth factor Idgf2, Idgf3, Idgf1
 Ligand binding or carrier Fer1HCH, Nrv2, CG6783, Ras85D, dome, Cyt-c-d, chic
 Mitosis LamC, eff, eIF-4E, stg
 Protein amino acid dephosphorylation CG11597, stg, PP2A–B′, dome, puc
 Protein amino acid phosphorylation Cdk4, par-1, for, par-1, Pak3, gish, fray, CG10967, for, Lk6
 Signal transduction Dad, grk, Myd88
Molecular functions
 Actin binding cpb, bif, chic
 DNA binding Eli, Tis11, E2f, D1, HmgD, Alhambra, Rbf
 Establishment/maintenance of chromatin  architecture HmgD, ash2, eIF-4E
 Glutathione transferase CG17531, CG17533, CG5224, CG1681
 Heat shock protein Hsp22, Hsp26, Hsp27
 Hydrolase Nrv2, CG11597, Idgf3, Faa, PP2A–B′, CG8689, Ras85D, aay, puc, CG9026, ESTS:172F5T, CG10992
 Hydrolase, acting on glycosyl bonds Idgf3, CG8689, ESTS:172F5T
 Integral plasma membrane protein Nrv2, PGRP-LA, CG3829
 Ion transporter Nrv2, BcDNA:LD28120, Sip1
 Kinase SelD, CG1216, CG4798
 Ligand Idgf3, grk, spz
 Oxidoreductase Gs1, BEST:CK02318, Mdh, CG6199, slgA, CG15093, Sptr
 Oxidoreductase CH-OH group of donors Mdh, CG15093, Sptr
 Phosphatase CG11597, PP2A–B′, aay, puc, stg, dome
 Protein serine/threonine kinase Cdk4, for, par-1, fray, CG10967, for, Lk6
 Serpin sp1, Spn43Aa, sp4
 Signal transducer Idgf3, dome, grk, spz
 Transcription factor E2f, E11, CG11799, CG11867, Iola, NK7.1
 Transferase CG1140, Ugt35a, CG8782, SelD, CG5431, CG1681, CG1216, CG4798, Ugt35b
 Translation initiation factor eIF-4a, eIF-4E, Syx1A
 Transporter Nrv2, BcDNA:LD28120, Sip1
 UDP-glucuronosyltransferase Ugt35a, Ugt86Da, Ugt35b
Standardized gene annotations were taken from the Drosophila Gene Ontology database (release February 2002). A complete list of target genes can be found in Table 4. 

reduce Analysis Identifies in Vivo Binding Motifs of GAF.

To confirm the in vivo binding sequence of GAF, we used an unbiased bioinformatics method. reduce is a motif-based regression analysis method originally designed for the discovery of regulatory elements based on microarray expression data (23). Here, we applied the same algorithm to find sequence motifs whose occurrence correlates with the chromatin profiling data for GAF. A major advantage of reduce is that it analyzes the entire set of probed loci and does not rely on clustering or prior partitioning into “target” and “nontarget” loci. Instead, reduce uses the full quantitative dataset obtained from one or more chromatin profiling experiments. Moreover, the output of reduce includes statistical parameters that indicate the correlation strength (represented as a t value) and statistical significance (P value) for each sequence motif, taking into account corrections because of the parallel testing of many motifs.
To account for the cis-spreading of targeted methylation, we performed reduce by using the sequences of the genomic regions corresponding to the cDNAs on the microarray, including introns and 2 kb of flanking sequence added to both the 5′ and 3′ ends (Fig. 2A). Thus, we made no prior assumptions on the location of GAF binding sites relative to the transcribed regions, but instead analyzed the complete genomic regions where binding of GAF-Dam would in principle be detectable.
Figure 2
(A) Sequences used for reduce analysis. For a given gene (filled bars), a corresponding cDNA probe (open bars) detects only methylation levels of matching exon sequences. Because targeted methylation spreads in cis, we included introns and a variable amount of 5′ and 3′ flanking sequences (dashed lines) in the reduce analysis (gray bar labeled “Probed locus”). In Table 2 and Fig. 3 the amount of flanking sequence was set to 2 kb. (B and C) Determination of optimal length of flanking sequences. Density of GAGAG motifs (B) and t value corresponding to the Pearson correlation between GAF binding and the number of occurrences of the GAGAG motif (C) for the probed loci plus 0, 1, 2, 4, or 8 kb of flanking sequence on each side.
For all possible sequence motifs up to 7 nt we tested whether their occurrence in the probed genomic regions correlates with GAF binding. The results show that (GA)n repeats indeed are strongly correlated with GAF binding (Table 2 and Table 5, which is published as supporting information on the PNAS web site). When ranked by correlation, all of the top 20 motifs contain (GA)n repeats. This finding demonstrates the specificity of our chromatin profiling technique and confirms that GAF binds selectively to (GA)n motifs in the native chromatin context.
Table 2
Most significant motifs found by reduce analysis, for probed loci with 2-kb flanking sequence
Rank Motif r 2 t −log10 (P value) Matches Loci with match
 1 AGAGAG 0.08411 19.646 >16 6,958 2,677
 2 GAGAG 0.08353 19.572 >16 22,454 4,084
 3 AGAGA 0.07613 18.61 >16 25,640 4,120
 4 GAGAGA 0.07043 17.846 >16 6,675 2,720
 8 AAGAGAG 0.05598 15.788 >16 1,940 1,384
11 GAGAGAG 0.05345 15.406 >16 2,762 1,217
13 AGAGAGC 0.04817 14.585 >16 1,848 1,350
14 AGAGAGA 0.04673 14.353 >16 2,876 1,309
16 AAGAGA 0.04087 13.382 >16 7,993 3,373
17 GAGAGAA 0.03944 13.137 >16 1,731 1,333
22 AAAGAGA 0.03592 12.514 >16 3,054 2,023
26 GAGAGCG 0.03245 11.872 >16 2,125 1,528
28 AGAGAGT 0.03229 11.842 >16 1,178 981
35 AGAGAA 0.03012 11.424 15.5 8,221 3,418
36 GAGAGC 0.02949 11.302 15.2 7,275 3,269
42 CGAGAGA 0.02823 11.051 14.6 1,268 1,055
 
 5 CTCTC 0.06802 17.514 >16 21,401 4,062
 6 TCTCT 0.0619 16.653 >16 24,773 4,115
 7 CTCTCT 0.06175 16.632 >16 7,068 2,711
 9 CTCTCTT 0.05526 15.68 >16 1,861 1,339
10 GCTCTCT 0.05501 15.641 >16 1,772 1,307
12 TCTCTC 0.04922 14.751 >16 6,624 2,665
15 TCTCTT 0.04443 13.979 >16 7,390 3,298
18 CTCTCTC 0.03852 12.976 >16 2,742 1,150
19 CTCT 0.03762 12.818 >16 88,116 4,205
20 GCTCTC 0.03709 12.723 >16 6,627 3,103
23 CGCTCTC 0.03579 12.49 >16 1,858 1,325
31 TCTCTTT 0.03069 11.537 15.7 2,923 1,970
33 TCTCTCT 0.03017 11.435 15.5 2,971 1,300
34 CTCTTT 0.03017 11.435 15.5 8,940 3,540
44 TTCTCT 0.02798 11 14.4 7,486 3,313
46 CGCTCT 0.0274 10.882 14.2 5,904 2,986
 
30 AAACAA 0.03103 11.601 15.9 25,931 4,137
45 AACAAA 0.02752 10.906 14.2 25,282 4,122
49 ACAAA 0.02648 10.693 13.7 61,740 4,202
57 AAAACAA 0.02563 10.515 13.3 10,584 3,556
 
24 TTGTT 0.03248 11.878 >16 57,441 4,204
27 TTGTTT 0.03232 11.847 >16 23,214 4,121
43 TGTTT 0.02816 11.036 14.5 58,404 4,202
50 TTTGT 0.02648 10.692 13.7 59,213 4,203
52 TTTGTT 0.02636 10.667 13.6 22,839 4,111
 
32 TACATA 0.03065 11.528 15.7 14,749 3,790
40 ATACATA 0.02898 11.201 14.9 6,679 2,766
54 ATACA 0.02587 10.565 13.4 37,679 4,191
58 ACATAT 0.02559 10.507 13.3 13,943 3,864
 
39 TATGTA 0.02907 11.217 14.9 14,268 3,822
53 ATATGTA 0.02605 10.603 13.5 6,207 2,891
A variety of (GA)n motifs displayed highly significant correlation (P < 10−12) with GAF binding, but only if n ≥ 2. No significant correlation was found for the trinucleotide motif GAG (P = 0.2), which was previously reported to bind GAF in vitro (24). Thus, in the native chromatin context, at least two GA repeats are necessary for GAF recruitment, and 2.5 or 3 repeats appear to be optimal. Interestingly, the reduce algorithm found roughly equal correlation values for (GA)n and (CT)n motifs. This finding demonstrates that GAF binds with approximately the same frequency in either orientation relative to the direction of transcription.
A few additional motifs display a weaker but highly significant correlation with GAF binding, in particular several variants of TnGTm or its reverse complement AnCAm, and (AT)nACA(TA)m or its reverse complement (TA)nTGT(AT)m (Table 2). Given the high specificity of GAF for (GA)n repeats in vitro, these unrelated motifs are unlikely to recruit GAF directly. Rather, we speculate that these motifs bind other sequence-specific factors that provide a favorable context for GAF binding to (GA)n motifs.
Although the overall correlation between GAF binding and the occurrence of (GA)n elements is highly significant, it is far from perfect. For example, the number of GAGAG matches in a probed locus (in both orientations) and the GAF-binding log ratio only correlate with r = 0.34. In part, this imperfect correlation may be attributed to some random noise in our GAF mapping data. In addition, because a variety of (GA)n motif variants and a few other motifs correlate with GAF binding, each individual motif contributes to GAF binding to a limited extent. Below we will demonstrate that another explanation lies in the fact that specific regions in the genome, such as exons and 3′ downstream regions, bind GAF poorly, even though (GA)n elements occur with approximately the same frequency in these regions.

GAF Binds to Nontranscribed and Transcribed Regions.

Because our previous estimate of cis-spreading of targeted methylation (19) was of limited accuracy, we tested how the correlation between GAF binding and the presence of (GA)n elements was affected by including more or less flanking sequence in the reduce analysis. Throughout the remainder of this article, we will focus on the interaction of GAF with the motif GAGAG, which is the minimal high-affinity binding motif in vitro (11). This motif was one of the highest ranking motifs in the reduce analysis (Table 2). We use the t statistic tGAF:GAGAG of the Pearson correlation between the number of occurrences of GAGAG in the sequence and the observed GAF binding.
The results (Fig. 2 B and C) show that inclusion of ≈2 kb of flanking sequence results into a maximum value for tGAF:GAGAG. This finding indicates that GAF associates with GAGAG elements that are located upstream or downstream of the probed exons. Addition of >2 kb of flanking sequence leads to a weaker correlation, presumably because binding of GAF to sites >2 kb away from the probed regions does not add significantly to the methylation levels of the probed exons. This finding is in agreement with our previous estimate of 2- to 5-kb cis-spreading of targeted methylation (19).
Strikingly, if flanking sequences are left out completely, tGAF:GAGAG is still highly significant. This finding demonstrates that there is considerable binding of GAF to GAGAG elements within transcribed regions.

GAF Binding to Nontranscribed Regions.

To study the interaction of GAF with nontranscribed regions in more detail, we first determined the correlation between GAF binding and the occurrence of GAGAG elements in predicted nontranscribed sequences located within 2 kb of the probed loci (as indicated in Fig.2A). As expected, we found a highly significant correlation tGAF:GAGAG in these nontranscribed regions (Fig. 3 Left).
Figure 3
Binding of GAF to GAGAG elements in subregions of target genes. (Left) t value calculated as in Fig. 2C; for each bar, the statistical significance is indicated by value of −log10(P). (Right) GAGAG density. GAGAG occurrences were counted only in specific subregions of the probed loci as indicated in Fig. 2A.
We then investigated whether GAF binds preferentially to upstream or downstream nontranscribed regions by comparing the respective contributions of these regions to the observed correlation. We separated all intergenic regions within 2 kb from probed loci into three categories: between two divergent genes (exclusively upstream), between two convergent genes (exclusively downstream), and between two tandem genes (mixed upstream/downstream). For each category we determined the value of tGAF:GAGAG (Fig. 3). The results show that GAF preferentially associates with GAGAG elements in upstream intergenic regions, compared with downstream intergenic regions (Δt = 6.9; P = 5 × 10−12). As expected, the category of regions that can be regarded as both upstream and downstream (“tandem”) showed intermediate levels of correlation.
It is important to note that the observed correlations may be interpreted as an indication of the relative average binding of GAF per GAGAG element. Our observations therefore imply that GAGAG elements in downstream regions are occupied less frequently than in upstream regions. Because upstream and downstream noncoding regions harbor GAGAG elements at almost equal density (Fig. 3 Right), we conclude that GAF preferentially binds to upstream regions.

GAF Is Excluded from Exons.

Using the same approach, we tested whether GAF preferentially binds to GAGAG elements located in introns or exons (Fig. 3). Strikingly, we found a clear correlation between GAF binding and the occurrence of GAGAG in introns, yet no such correlation was detectable in exons (tGAF:GAGAG(exon only) = −0.3). Thus, GAF binds significantly to GAGAG elements in introns, yet fails to interact with GAGAG elements in exons. A more detailed multivariate analysis suggests that this exclusion from exons is particularly strong in relatively long exons (see Table 6, which is published as supporting information on the PNAS web site).

GAF Binds to Introns Throughout Transcribed Regions.

GAF is often bound near promoter regions, where it can facilitate initiation of transcription (25). Enhancer elements can be located within introns, and it is therefore possible that GAF associated with introns facilitates transcriptional initiation. However, one report has suggested that GAF binds in some genes throughout the transcribed region and perhaps may control transcript elongation (14). If intron-associated GAF plays a role in elongation, then it may be expected that GAF binds to introns irrespective of the distance to the promoter. We tested this by comparing the binding of GAF to all first and last introns of the probed loci (Fig. 3). Strikingly, the results show that the value of tGAF:GAGAG in last introns is at least as high as in first introns. This finding indicates that GAF binding to introns is not limited to promoter-proximal introns, which is in agreement with a role for GAGA factor in transcript elongation.

GAF Binds Preferentially to Closely Spaced Pairs of GAGAG Motifs.

The striking difference in GAF binding between exons and introns argued that the association of GAF with GAGAG is modulated by additional molecular cues. In theory, GAF binding could be either selectively inhibited in exons (for example, by a chromatin folding rendering GAGAG elements inaccessible) or selectively enhanced in introns and upstream intergenic regions (by cooperative interactions). In vitro, GAF is able to form oligomeric complexes and displays cooperative binding to closely spaced (GA)n elements (26, 27). We therefore investigated whether such cooperative binding could explain the observed regional differences in GAF binding.
To test whether GAF preferentially binds to clustered GAGAG elements in vivo, we ranked loci by their level of GAF binding and compared the spacing of GAGAG elements in the 500 loci with strongest GAF binding to the GAGAG spacing in the 500 loci with weakest GAF binding. The results (Fig. 4) reveal that GAF target loci are indeed enriched in GAGAG elements that are spaced by less than ≈20 bp. The degree of clustering of GAGAG elements in target loci is much higher than can be attributed to random spacing (Fig. 4A), and the 500 control loci with no GAF binding do not show clustered GAGAG elements (Fig. 4B). Taken together with previously reported in vitro binding studies (26, 27), this result strongly suggests that cooperative GAF binding occurs in the native chromatin context. Note that the clustering of GAGAG pairs is only significant at odd distances, suggesting that there is evolutionary pressure to preserve the even/odd character of (GA)n repeats even over distances up to at least 10 bp.
Figure 4
Spacing distribution of GAGAG elements in 500 loci with strongest GAF binding (A, average methylation log ratio = 0.96) and 500 loci with weakest GAF binding (B, average methylation log ratio = −0.44), for intergenic regions (filled bars), introns (shaded bars), and exons (open bars). Lines indicate the spacing distribution after randomization of the positions of all GAGAG elements in each locus (average of >1,000 random simulations is shown).
Comparative analysis shows that in the 500 probed loci with high GAF binding, intergenic regions and introns contain 40.3% and 43.3%, respectively, of all 641 pairs of GAGAG elements spaced <10 bp apart, whereas exons harbor only 16.4% (see Fig. 4A). Because 45% of the DNA in these loci consists of exon sequences, closely spaced GAGAG elements are significantly underrepresented in exons (P = 3 × 10−55; binomial distribution). It is possible that this lack of clustering of GAGAG motifs explains for a large part the absence of GAF binding to exons.

Discussion

The genomewide analysis presented here confirms that GAF is a pleiotropic regulatory protein that binds in vivo to (GA)n motifs. A few distinct other motifs correlate with GAF binding, suggesting that certain other sequence-specific factors may provide a favorable context for GAF binding. The identity of these factors is presently unknown.
Importantly, we find that not all (GA)n elements in the genome are occupied by GAF. GAGAG elements in exons, but not introns, appear to be devoid of GAF, and GAGAG elements in downstream nontranscribed regions show weaker binding of GAF than in upstream regions. Furthermore, we provide evidence for cooperative binding of GAF to (GA)n elements that are closely spaced. The underrepresentation of such closely spaced (GA)n elements in exons provides a possible explanation for the lack of binding of GAF to exons. Recently, a genomewide study in yeast showed that the Rap1p protein abundantly associates with its cognate binding sequence in promoter regions, but rarely binds to the same sequence in ORFs (28). Exclusion from exons may therefore be a more general phenomenon. Although a “genomewide mechanism that marks promoter regions in chromatin” was proposed to explain the Rap1p distribution (28), our results argue that the spacing of binding motifs can be a major determinant of protein targeting.
Our finding that GAF binds to introns irrespective of the intron-promoter distance is in agreement with a previously suggested role for GAF in transcriptional elongation (14). Interestingly, GAF is able to recruit a chromatin remodeling complex (13), which may facilitate the passage of the elongation complex through nucleosome-packaged genes.
We have demonstrated here that chromatin profiling combined with reduce analysis provides a powerful tool for studying context-dependent binding of transcription factors. Even though the chromatin profiling data were obtained with conventional cDNA arrays, the combination with reduce allowed us to reveal the preferred sequence motifs of GAF, as well as the distribution of GAF over functionally distinct subregions of genes, such as introns, exons, and nontranscribed regions. Thus, reduce increases the resolution and analytical power of chromatin profiling. This approach should be applicable to a variety of DNA-binding factors.

Abbreviation

GAF
GAGA factor

Acknowledgments

We thank Ineke van der Kraan for technical assistance, Steve Henikoff for critically reading the manuscript, and Roel van Driel for support. B.v.S. was in part supported by the Royal Netherlands Academy of Sciences. H.J.B. was partly supported by National Institutes of Health Grant 1P20LM007276-01.

Supporting Information

Supporting Information
Supporting Information
Supporting Information
Supporting Information
Supporting Information
Supporting Information
Supporting Information
Supporting Information
Supporting Information
Supporting Information
8000Table3.xls
8000Table4.xls
8000Table5.xls

References

1
M D Biggin Nat Genet 28, 303–304 (2001).
2
G Farkas, J Gausz, M Galloni, G Reuter, H Gyurkovics, F Karch Nature 371, 806–808 (1994).
3
G E Croston, L A Kerrigan, L M Lira, D R Marshak, J T Kadonaga Science 251, 643–649 (1991).
4
Q Lu, L L Wallrath, H Granok, S C Elgin Mol Cell Biol 13, 2802–2814 (1993).
5
J A Weber, D J Taxman, Q Lu, D S Gilmour Mol Cell Biol 17, 3799–3808 (1997).
6
M Okada, S Hirose Mol Cell Biol 18, 2455–2561 (1998).
7
K Hagstrom, M Muller, P Schedl Genetics 146, 1365–1380 (1997).
8
R K Mishra, J Mihaly, S Barges, A Spierer, F Karch, K Hagstrom, S E Schweinsberg, P Schedl Mol Cell Biol 21, 1311–1318 (2001).
9
G Wall, P D Varga-Weisz, R Sandaltzopoulos, P B Becker EMBO J 14, 1727–1736 (1995).
10
K M Bhat, G Farkas, F Karch, H Gyurkovics, J Gausz, P Schedl Development (Cambridge, UK) 122, 1113–1124 (1996).
11
J G Omichinski, P V Pedone, G Felsenfeld, A M Gronenborn, G M Clore Nat Struct Biol 4, 122–132 (1997).
12
R C Wilkins, J T Lis Nucleic Acids Res 25, 3963–3968 (1997).
13
T Tsukiyama, P B Becker, C Wu Nature 367, 525–532 (1994).
14
T O'Brien, R C Wilkins, C Giardina, J T Lis Genes Dev 9, 1098–1110 (1995).
15
H Strutt, G Cavalli, R Paro EMBO J 16, 3621–3632 (1997).
16
B van Steensel, J Delrow, S Henikoff Nat Genet 27, 304–308 (2001).
17
C Benyajati, L Mueller, N Xu, M Pappano, J Gao, M Mosammaparast, D Conklin, H Granok, C Craig, S Elgin Nucleic Acids Res 25, 3345–3353 (1997).
18
J S Platero, A K Csink, A Quintanilla, S Henikoff J Cell Biol 140, 1297–1306 (1998).
19
B van Steensel, S Henikoff Nat Biotechnol 18, 424–428 (2000).
20
G M Rubin, L Hong, P Brokstein, M Evans-Holm, E Frise, M Stapleton, D A Harvey Science 287, 2222–2224 (2000).
21
A D Long, H J Mangalam, B Y Chan, L Tolleri, G W Hatfield, P Baldi J Biol Chem 276, 19937–19944 (2001).
22
Y Benjamini, Y Hochberg J R Stat Soc B 57, 289–300 (1995).
23
H J Bussemaker, H Li, E D Siggia Nat Genet 27, 167–171 (2001).
24
R C Wilkins, J T Lis Nucleic Acids Res 26, 2672–2678 (1998).
25
H Granok, B A Leibovitch, C D Shaffer, S C Elgin Curr Biol 5, 238–241 (1995).
26
K R Katsani, M A Hajibagheri, C P Verrijzer EMBO J 18, 698–708 (1999).
27
M L Espinás, E Jimenez-Garcia, A Vaquero, S Canudas, J Bernues, F Azorin J Biol Chem 274, 16461–16469 (1999).
28
J D Lieb, X Liu, D Botstein, P O Brown Nat Genet 28, 327–334 (2001).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 100 | No. 5
March 4, 2003
PubMed: 12601174

Classifications

Submission history

Received: July 22, 2002
Accepted: December 30, 2002
Published online: February 24, 2003
Published in issue: March 4, 2003

Acknowledgments

We thank Ineke van der Kraan for technical assistance, Steve Henikoff for critically reading the manuscript, and Roel van Driel for support. B.v.S. was in part supported by the Royal Netherlands Academy of Sciences. H.J.B. was partly supported by National Institutes of Health Grant 1P20LM007276-01.

Authors

Affiliations

Bas van Steensel
Netherlands Cancer Institute, 1066 CX, Amsterdam, The Netherlands; DNA Array Facility, Fred Hutchinson Cancer Research Center, Seattle, WA 98109; Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027; and Swammerdam Institute for Life Sciences, University of Amsterdam, 1018 TV, Amsterdam, The Netherlands
Jeffrey Delrow
Netherlands Cancer Institute, 1066 CX, Amsterdam, The Netherlands; DNA Array Facility, Fred Hutchinson Cancer Research Center, Seattle, WA 98109; Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027; and Swammerdam Institute for Life Sciences, University of Amsterdam, 1018 TV, Amsterdam, The Netherlands
Harmen J. Bussemaker
Netherlands Cancer Institute, 1066 CX, Amsterdam, The Netherlands; DNA Array Facility, Fred Hutchinson Cancer Research Center, Seattle, WA 98109; Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027; and Swammerdam Institute for Life Sciences, University of Amsterdam, 1018 TV, Amsterdam, The Netherlands

Notes

To whom correspondence should be addressed. E-mail: [email protected] or [email protected].
Communicated by Robert N. Eisenman, Fred Hutchinson Cancer Research Center, Seattle, WA

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    Genomewide analysis of Drosophila GAGA factor target genes reveals context-dependent DNA binding
    Proceedings of the National Academy of Sciences
    • Vol. 100
    • No. 5
    • pp. 2165-3004

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media