An Inr‐DPE Pol II promoter promotes flam piRNA cluster transcription
To identify the
flam TSS, we performed 5′RACE experiments on four independent RNA extracts from
Drosophila ovaries and ovarian somatic stem (OSS) cells (
Supplementary Table S1). From the capped RNA fraction, a TSS located at position 21,502,918 (flybase version FB2011_08) was identified in all the independent amplifications from both ovary and OSS cell RNA extracts (Fig
1A). Several other TSSs (a total of 10) were occasionally amplified but were found in only one of the experiments performed. These data suggest that the
flam transcripts are initiated from a major promoter located 1733 bp upstream of
DIP1.
To gain a better understanding of the core promoter of
flam, we examined the motifs located upstream and downstream of the TSS. Based on the consensus initiator element (Inr) sequence TCAGTY obtained by computational analysis of thousands of
Drosophila core promoters
1112, we found that only the major TSS contains a consensus Inr sequence TCAGTT. In this Inr element, the A nucleotide corresponds to the +1 position of the core promoter (Fig
1B). Further analysis did not reveal a consensus TATA box, where the upstream T is usually located at −31 or −30 nt relative to the A +1 (or G +1) position in the Inr. However, a CGTG tetramer was characterized at +23 to +26 bp of the major TSS as a downstream promoter element (DPE), which is typically over‐represented in many
Drosophila TATA‐less promoters. Like many
Drosophila and mammalian promoters
1314, a wide area in the vicinity of the major
flam TSS (from −50 to +70 bp) displays a significant increase in GC content, which is known as a “GC hill.” Aside from this major TSS, no other TSSs identified in this experiment displayed such promoter characteristics. Overall, these data designate the TSS located at 21,502,918 as the main promoter of the
flam piRNA cluster.
To assess the potential of the
flam Inr core promoter to drive transcription, the promoter region (SFI) including 515 bp upstream of the TSS and 101 bp of the transcribed sequence was cloned upstream of the luciferase reporter gene at the ATG start codon of the coding region. Transcriptional activities were measured in transient transfection experiments in OSS cells. Our results indicate that this
flam fragment is sufficient to promote high‐level expression of the
luciferase reporter gene since an almost 30‐fold enhancement of transcription of the firefly
luciferase gene was observed compared to the empty plasmid (Fig
1C). Then, we generated a new reporter, SFIΔInr, that lacks the Inr sequence. This deleted reporter resulted in a significant decrease in luciferase expression compared to the transcriptional enhancement exhibited by the wild‐type SFI. These results confirm the importance of the Inr sequence for promoting transcription of the
flam locus.
The presence of an Inr core promoter and a cap structure indicates that RNA polymerase II (Pol II) could be responsible for flam transcription. In order to test this hypothesis, we treated OSS cells with alpha‐amanitin, an inhibitor of initiation and elongation of Pol II. Transcription efficiency of the flam locus was determined by RT‐qPCR using primer pairs spanning three different regions of flam. 18S ribosomal RNA known to be transcribed by RNA polymerase I (Pol I) was used as a reference gene for normalization.
We found up to tenfold decreases in
flam‐derived long RNAs in cells cultured in the presence of alpha‐amanitin, indicating that
flam transcription is indeed Pol II dependent (Fig
2A). The amount of
rp49 transcripts (known to be transcribed by Pol II) is shown as a positive control. Moreover, using Pol I or Pol III inhibitors
1516, we confirmed that
flam transcripts are indeed products of Pol II (
Supplementary Fig S1).
Then, we performed ChIP‐qPCR experiments using an antibody against the initiating form of Pol II. We found that Pol II was more strongly recruited immediately downstream of the
flam TSS than elsewhere within the gene body (Fig
2B). Thus, Pol II is the polymerase involved in
flam piRNA cluster transcription. These results extend findings obtained in mouse testes, in which piRNA precursor transcripts have been described to be canonical Pol II transcripts bearing 5′caps and 3′ poly(A)
10.
The transcription factor, Cubitus interruptus, is required to activate transcription of the flam locus
To identify cis‐regulatory sequences, we constructed serially deleted promoter‐
luciferase reporter plasmids containing various lengths of the
flam promoter region from either −1,624 bp (SF), −515 bp (SFI) or −356 bp (SFII) upstream to +101 bp downstream of the TSS. When the SF construct was used for transfection, efficient reporter activity was detected (Fig
3A). Deletion of the region from −1,624 to −515 (SFI) did not result in any significant change in promoter activity. On the contrary, further deletion to −356 (SFII) caused an eightfold decrease in promoter activity compared to the SFI construct. Finally, a NC construct corresponding to SFI in which the
flam fragment comprised between −515 and −356 has been replaced by a 159‐bp fragment of a non‐promoting sequence, confirmed that the region located downstream of position X: 21,502,403 (−515 bp) and upstream of position X: 21,502,562 (−356 bp) contains critical
cis‐elements required for the transcriptional activation of the locus.
Within the −515; −356 region, nine potential transcription factor‐binding sites were identified using genomatix MatInspector (Fig
3B). Based on the modENCODE dataset, four of them are expressed in OSS cells: Broad (Br), Big‐brother (BgB), Doublesex (Dsx) and Cubitus interruptus (Ci). To specifically analyse the involvement of these factors in
flam transcription, we performed successive deletions of each of their predicted binding sites (Fig
3C). The expression of each construct significantly decreased when compared with the SFI control but the most severe reduction (tenfold) was observed with SFI deleted for the Ci binding site, which was similar to the levels seen with the SFII construct. This suggests that the Ci binding site is necessary for the activation of
flam transcription.
Several lines of evidence further implicated Ci in regulating
flam transcription. First, Ci is expressed in follicle cells from the germarium to stage 6 egg chambers (Fig
4A) (
Supplementary Fig S2)
17. Second, based on ChIP assays, we found that Ci is 10‐ to 12‐fold more recruited around the TSS and its predicted binding site than elsewhere in the locus (Fig
4B). Third, mutant clones generated by mitotic recombination using flies [
y‐hs‐flp;
FRT42D P[
Ci+] /
FRT42D hs‐MYC 45;
Ci94/
Ci94] indicated that the
flam transcript level decreases in
Ci mutants in a manner similar to the decrease observed for
ptc transcripts, a gene known to be activated by Ci, but not producer of piRNAs
18 (
Supplementary Fig S2). Fourth, siRNA‐mediated knockdown of
Ci in OSS cells led to a decrease in
flam transcripts two days post‐transfection (Fig
4C). In contrast, the production of piRNAs and the TE mRNA levels were not significantly affected (
Supplementary Fig S3). However, an upregulation of TE expression was observed 4 days post‐infection (Fig
4D). A delay is observed between disruption of
flam transcription and TE deregulation possibly due to stability and abundance of
flam piRNAs.
Finally, evidence that Ci is involved in
flam transcription was also provided by an analysis of the
flam mutation present in the BG lines
19. In this line, a P‐element insertion at the 5′ end of
flam results in an absence of the precursor transcripts encoded by
flam 1. When examined in detail, we found that the P‐insertion occurred at position X:21,502,538 (−380 bp from the TSS), a position that disrupts the Ci binding site. Considered together, these data strongly suggest a role for Ci in the activation of
flam transcription.
In
Drosophila somatic follicle cells, the major sources of piRNAs are the
flam locus and the cluster 2. Thus, we examined the cluster 2 promoter and found an Inr consensus sequence (21,390,615) 108 bp upstream of the first piRNA, and a Ci binding site 2,846 bp upstream of the Inr (
Supplementary Fig S4). Furthermore, Ci mutants led to a decrease in
cluster 2 expression (Fig
4C and
Supplementary Fig S2). These data suggest that Ci might also contribute to the transcription of other piRNA clusters in these cells.
A comparative analysis of the
flam promoter region performed across several
Drosophila species,
D. sechellia, D. simulans, D. yakuba, D. erecta, was then performed. These species diverged from a common ancestor approximately 10 million years ago
2021. We found that
flam orthologs are located on the pericentromeric X‐chromosome close to the
DIP1 gene in
D. simulans and
D. erecta, similar to
D. melanogaster, whereas they are still assigned in a scaffold in
D. yakuba and
D. sechellia (
Supplementary Table S2). A multiple alignment revealed two highly conserved regions located at positions (−14;+37) and (−398; −372) according to the
D. melanogaster flam TSS. The first (−14;+37) corresponds to the Inr‐DPE core promoter suggesting a high conservation of its function. The second (−398; −372) includes the Ci binding site (
Supplementary Fig S4). Then, we plotted uniquely mapping piRNAs that could be assigned to the putative
D. erecta flam locus
5. We found that, like in
D. melanogaster, the density of piRNAs is very low close to the
flam presumptive promoter and it highly increases 1 kb downstream (
Supplementary Fig S4). This analysis of the
flam promoter sequence across several
Drosophila species confirms that the Inr‐DPE and the Ci binding site are necessary motifs for
flam transcription.
The flam transcript is alternatively spliced and gives rise to multiple flam precursors
The
flam piRNA cluster has been proposed to produce a long single‐stranded precursor RNA that is processed into primary piRNAs in the cytoplasmic Yb bodies
622. We sought to better characterize this proposed long precursor. Fragments amplified from the 5′RACE experiments described above to localize the TSS were systematically sequenced. This allowed the identification of an intron located between bases +432 and +2067 from the
flam promoter. Then, RT‐PCR experiments were performed using a 5′ primer taken either within the first or the second exon, and 3′ primers designed along the 180 kb of this cluster. Figure
5A shows structures of
flam transcripts deduced from sequencing of RT‐PCR products. Different patterns of intron splicing were detected. The intron sizes are extremely diverse and range from 0.7 kb to 158 kb. Interestingly, the first exon (exon 1: 21,502,918…21,503,349) was found to be constitutively spliced since it is always present within the processed RNAs. By contrast, downstream of this first common exon, the other exons differ indicating that they result from alternative splicing. Analysis of
flam spliced transcripts revealed that the majority of the intron boundaries obey the GT‐AG rule (
Supplementary Tables S3 and
S4).
To verify our findings, we interrogated publicly available RNA‐seq libraries
23 and found that indeed very few reads corresponding to intron 1 have been reported compared to the number of reads mapping exon 1 or exon 2 (Fig
5B). We found that 84% and 16% of reads mapped the first exon–exon and intron–exon junction, respectively (Fig
5C). Then, we extended this analysis to 21 major piRNA clusters expressed in ovaries and found that seven of them contain introns (
Supplementary Fig S5). These data suggest that several piRNA clusters including
flam are transcribed as a long primary multi‐kilobase RNA transcript before being spliced.
To determine whether these spliced RNAs are processed into piRNAs, we sequenced small RNAs from OSS cells and searched for reads that align uniquely to the identified
flam spliced junctions. Reads spanning exon junctions were identified. Furthermore, we found that piRNAs encompassing the exon 1/intron 1 junction are under‐represented compared to piRNAs matching the splice junction (Fig
5C). These results further indicate that
flam transcripts are processed into piRNAs after the precursor is spliced. Although the diversity of alternatively spliced transcripts of
flam is likely underestimated, it can be predicted that the multiple splicing events contribute to create a high diversity of
flam precursors.
In
flamKG mutant
, the
KG transgene is localized at position 21,505,285 downstream of the TSS, at the beginning of intron 2. Nevertheless, homozygote
flamKG mutant females exhibit atrophic ovaries like
flamBG females
24. This ovarian phenotype has been attributed to an absence of
flam transcription. If the reason why
flamBG transcription is affected can be explained by disruption of the Ci binding site, the reason why
flam transcription is also affected in the
flamKG mutant remains obscure. It can be proposed that either the correct transcription of
flam or the stability of its transcripts is affected. We have shown that the
KG transgene is located at the border of the second intron. Disruption of this site might prevent its recognition as a donor site. Since almost all the spliced transcripts detected in WT
flam alleles contain this spliced border, it might then be anticipated that this donor site plays a crucial role in generating the pool of alternative spliced RNAs.
flam mutation due to
KG insertion would then lead to unstable
flam transcripts and thus, as for the
BG insertion, to a phenotype of atrophic ovaries.
Overall,
flam precursors display two characteristics: first, they display distinct structures resulting from alternative splicing, and second, they all share the first exon at their 5′ end. Future work is needed to elucidate the function of this common 5′ end. A likely hypothesis is that it helps to transfer RNA precursors from their site of transcription to Dot COM at the nuclear membrane facing the cytoplasmic Yb bodies, where they are processed to piRNAs. Recently, UAP56, a helicase of the exon junction complex (EJC), has been shown to play a role in the transport of germline precursor piRNA transcripts to the nuclear pore
25. It remains to be clarified whether the recruitment of the EJC necessary for
flam splicing also plays a role in the stabilization, surveillance and transport of the
flam precursors.
Many TE families are known to originate from recent horizontal transfer between
Drosophila species
26. Recently, we have reported that many of these new TEs preferentially insert within heterochromatic regions such as the
flam locus
27. Thus, the dynamic nature of this piRNA cluster suggests that novel motifs for splicing are constantly gained or lost resulting in distinct pools of
flam precursors. Such stochastic splicing depending on structural modifications affecting piRNA loci might help genomes to rapidly react against new TE invasions.