Transcriptional and epigenetic analyses of the DMD locus reveal novel cis ' acting DNA elements that govern muscle dystrophin expression

Please cite this article as: Samuele Gherardi, Matteo Bovolenta, Chiara Passarelli, Maria Sofia Falzarano, Paolo Pigini, Chiara Scotton, Marcella Neri, Annarita Armaroli, Hana Osman, Rita Selvatici, Francesca Gualandi, Alessandra Recchia, Marina Mora, Pia Bernasconi, Lorenzo Maggi, Lucia Morandi, Alessandra Ferlini, Giovanni Perini , Transcriptional and epigenetic analyses of the DMD locus reveal novel cis'acting DNA elements that govern muscle dystrophin expression, (2017), doi: 10.1016/ j.bbagrm.2017.08.010


Introduction
Mutations in the dystrophin gene (DMD) cause Duchenne Muscular Dystrophy (OMIM *310200), an X-linked muscle disorder, characterized by the complete absence of the dystrophin protein.
Milder allelic forms are both the Becker Muscular Dystrophy (BMD, OMIM *300376), presenting a disease course mitigated by the presence of a residual Dystrophin expression, and the X-linked dilated cardiomyopathy (XLDC, OMIM *302405) characterized by predominant heart involvement.
Other allelic milder disorders such as quadriceps myopathy and isolated high CK can also occur.
The DMD gene consists of 79 exons and 78 introns encoding for at least seven distinct isoforms, whom transcription is driven by seven different promoters. Moreover, such promoters have a tissueand time-specific regulation [1]. Three of them drive the transcription of full-length isoforms that share 78 exons, but with one first exon that is unique to each isoform. The three full-length isoforms are named Dp427b, Dp427m and Dp427p, where b-brain, m-muscle and p-purkinje indicate the tissue specificity or prominent expression of their synthesis: brain (cerebral cortex and hyppocampus), striated muscle (including skeletal and cardiac muscle) and cerebellum Purkinje cells respectively. Nevertheless, Dp427b shows ectopic expression sites, and is present in some brain compartments as well as in the heart, the Dp427m is expressed in the muscle and heart, whereas the Dp427p is exclusively present in brain, depending on the developmental stage [1].

A C C E P T E D M A N U S C R I P T
4 The large genomic size of DMD is due to the presence of extremely large introns, which account for roughly 99% of DMD genomic locus [3].
For all these variegated reasons, the transcription of the DMD locus is known to be a very complex process, which remains however, poorly understood. This is somehow surprising considering that in the last years, novel therapies have emerged to treat DMD, acting on splicing modulation or translation modification [4,5] and there is no doubt that these therapeutic protocols would tremendously benefit from a deeper understanding of the transcription and splicing regulatory mechanisms the DMD locus undergoes. Nonetheless, relatively few studies were published about DMD transcription regulation. Among these, Tennyson et al. evaluated the time required to complete DMD gene transcription, being 16 hours [6]. Studies carried out in mouse (in vivo) and human skeletal muscles (in vitro) identified one enhancer element, named Dystrophin muscle enhancer-1 (DME-1) [7,8] which shows a specific tissue regulation, exerting its activity only on the Dp427m isoform expressed in skeletal muscle [9], through specific DNA sequences [10] and specific muscle transcription factors such as MyoD [11]. More recently, it has been shown that Dystrophin production is under the control of a variety of RNA molecules (miRNAs or lncRNAs).
Among these, some miRNAs, named dystromirs, can regulate Dystrophin expression in trans, by acting via binding to the 3'UTR regions of the cognate transcript, and are increased in patients' muscle and plasma, possibly representing prognostic biomarkers [12]. In addition to that, we have also recently identified five novel long non-coding RNAs, transcribed inside the DMD locus that downregulate the basal transcription status of dystrophin [13]. Even more recently, the dystrophin splicing has been finely studied and recursive multistep splicing occurs, as a further demonstration of the complex regulation of the processing of this extremely large gene [14].
The long DMD locus transcription timing (16 hours) suggests that mechanisms through which RNA polymerase II accomplishes this task might be very complex. For instance, it is not entirely known the set of cis and trans regulatory elements that control transcription from the many dystrophin promoters. Furthermore, it is not clear whether RNA pol II proceeds at a constant pace over the 2,2 Mb region or instead pauses at specific sites and whether these pausing sites are functional to the proper maturation of the dystrophin transcripts as recently observed for genes in yeast [15].
Based on these premises, we have analyzed the activity of RNA pol II during the transcription of the dystrophin gene using a ChIP-chip approach. Moreover, the major modifications of the RNA pol II CTD (carboxyl-terminal domain) (phosphorylation on Serine 5 and/or Serine 2) suggestive of transcript elongation or pausing were monitored and correlated with the chromatin context.
Our findings unveil a surprisingly powerful processivity of the RNA pol II along the entire 2. The full microarray data has been deposited in the NCBI GEO as series GSE66571.

Cloning and Luciferase assay:
The pGL3-basic and Renilla-TK vectors were obtained from Promega. The promoter regions of the Dp427m isoform along with those corresponding to intron 52 or exon 62 were obtained using PCR and cloned into the pGL3-basic vector. Specific primer pairs are listed in S1 Tab. The activity of firefly or Renilla luciferase was measured with a dual luciferase assay kit (Promega) according to the instructions.

Chromosome Conformation
Capture: 1 × 10 7 cells were resuspended in 10 ml 1X PBS supplemented with 10% FBS and incubated with 1% formaldehyde for 10 min at room temperature.
To stop formaldehyde crosslinking reaction, 0,5ml of 2,5M Glycine were added and cells were incubated on ice for 10 min. Cells were spun at 225g for 8 min and pellet was incubated on ice for 20 min with 5 ml of Cell Lysis buffer (10mM Tris-HCl Ph8; 10mM NaCl; 5mM MgCl2; 0,1mM EGTA; 1X complete protease inhibitor, Roche). After centrifugation (10 min at 800g) nuclei were A C C E P T E D M A N U S C R I P T 8 collected and resuspended in 0,6 ml of 1,13X NEB buffer 4+BSA. Nuclei were incubated with 0,3% SDS for 1h at 37 ºC while shaking at 900 rpm, than Triton X-100, they were added to a final concentration of 2% and nuclei were incubated for 1h at 37 ºC while shaking at 900 rpm. 800 U of XbaI restriction enzyme were added and the reaction was incubated overnight at 37 ºC at 900 rpm.
The day after, digested nuclei were incubated 1,6% SDS at 65ºC for 25 min at 900 rpm. After SDS incubation, digested nuclei were re-suspended in 6 ml of 1X NEB T4 ligase buffer and incubated with 1% Triton X-100 for 1h at 37ºC with gently shaking. Then, nuclei were incubated with 300 U NEB T4 ligase for 6 hours at 16°C. After ligation, cross-linking was reversed by incubation with 130 mg Proteinase K (Roche) at 65ºC o/n. DNA was purified by Phenol/Chloroform/Isoamyl extraction, ethanol precipitated. As negative controls, non cross-linked cells were used. qPCR (TaqMan® Environmental Master Mix 2.0, Life-Technologies) with primer pairs and probes (Tab S1) specifically amplifying the looped product as well as the control product was performed according to manufacturer's instructions. Moreover, intra and extra chromosomal negative controls were used. For each amplicon, the qPCR reactions were terminated when the samples with lowest threshold cycle number were in the late exponential phase and loaded on agarose gels for relative quantification of specific amplicons.

Patients:
A total of nine patients with Becker muscular dystrophy were selected from a cohort using as inclusion criteria the presence of deletion mutations comprising exons 34, 45, 34-45 or none of these regions. In the BMD patient with a 13-34 deletion, the intron-34 breakpoint was distal to the DMI34 region, assessed by PCR analysis. A muscle from a healthy subject was used as a control. Skeletal muscle biopsies were obtained from the Telethon Biobank of the C. Besta Neurological Institute of Milan (Italy).

Protein analysis by Western Blotting:
For Western blotting analysis, proteins were homogenized from either snap frozen muscle tissue or cryostat cut sections as reported by Anthony et al [18]. An amount of 40-60 g of total proteins was loaded onto a 6% polyacrylamide gel and transferred onto nitrocellulose membranes. Membranes were incubated overnight at 4°C with Dys2 (1:50, Leica) and sarcomeric alpha-actinin (as a loading control, 1:7500, Sigma) primary antibodies.
After several washings, membranes were incubated for 1 hour with secondary antibodies: antimouse IR800 (for dystrophin detection, 1:15000, Licor) and anti-mouse IR680 (for alpha-actinin detection, 1:15000, Licor). Membranes were imaged using a LiCor Odyssey scanner. For quantification, dystrophin intensity was normalized to alpha-actinin using ImageJ software and expressed as a percentage of control.

Mapping RNA Polymerase II activity along the DMD locus through a ChIP-chip approach
To understand the dynamics of how the RNA polymerase II (RNA-Pol II) transcribes the DMD locus we adopted a ChIP-chip approach. The RNA Pol II binding to DNA along its functional modifications in the Carboxyl-Terminal Domain (CTD) were correlated with histone marks such as Ac-H3, H3K4me2, which characterize the open chromatin of genes actively transcribed. As a cell system, we used SJCRH30, a human rhabdomyosarcoma cell line which derives from a male striated muscle tumor and maintains the several features of differentiated muscle cells, such as for instance expression of the muscle and brain dystrophin isoforms (Dp427b and Dp427m) [20] with levels comparable to that of GAPDH, an housekeeping metabolic gene whose expression is often used as a reference in mRNA quantitation assays (Fig. S1). As a negative control we used instead HeLa cells which, with the sole exception of the ubiquitous Dp71 mRNA, do not express any muscle/brain dystrophin transcripts (Fig. S1). The immune-precipitated DNA was, then, hybridized to a DMD locus specific custom-made chip array. ChIP on chip row data were statistically analysed as described in materials and methods. Results showed that RNA pol II was strongly associated with the promoter regions of Dp427b, Dp427m, and Dp71 isoforms in SJCRH30 cells, but, as expected, not with that of the Dp427p isoform, which is not expressed in this cell line ( Fig. 1A and   Figure S2 (SJCRH30) and S3 (HeLa). Raw data were normalized by blank-subtraction and variance stabilization. Statistically significant enriched probes (red square) were identified by Whitehead Error Model. Whitehead neighbourhood model was used to detect peaks (yellow box).

A C C E P T E D M A N U S C R I P T
13 Aside from these observations, which were somehow expected, we also found additional four DNA regions inside the DMD locus that resulted positive for some of the markers analyzed. First, the RNA Pol II was strongly bound to two regions: one inside intron 52 (DMI52) and another one around exon 62 (DME62) (Fig. 1C and Fig. S2C). The fact that both regions were also marked for presence of RNA pol II p-Ser2, p-Ser5, H3ac, H3K4me2 suggested that these two regions might correspond to regulatory elements in proximity of putative transcription start sites. These two regions appear to function in muscle like cells only, for the markers were absent in HeLa cells ( Fig.1D and Fig. S3A ). Furthermore, in SJCRH30, two additional DNA regions in intron 34 (DMI34) and in exon 45 (DME45) were also found for presence of pan-H3ac and H3K4me2 but not of the RNA Pol II (Fig. 1E and Fig. S3B), and again absent in HeLa cells (Fig 1F and supp. Fig 3C).
This suggested that the latter two DNA regions may play some sort of muscle specific function which is independent from the RNA Pol II activity.

Function of DMI52 and DME62 regions
The fact that RNA pol II associates with DMI52 and DME62 suggested that these regions might correspond to either novel transcription start sites or pausing sites. To address this issue, we first validate ChIP on chip data by performing ChIP in SJCRH30 and analyzing the association and distribution of RNA pol II and related CTD modifications along the two DNA sites. Results in Fig.   2 (-DRB panel) show that, indeed these two regions were bound by RNA pol II. To discriminate whether the regions were involved in de novo transcription or pausing we performed ChIP on cells pretreated with 5,6-Dichlorobenzidazole 1-beta-D ribofuranoside (DRB), an RNA Polymerase II inhibitor, which prevents the polymerase to switching from the stalled to a processive conformation [21]. Indeed, following treatment, only the transcriptional start sites should be enriched by RNA Polymerase II, since a hypothetical pausing site would be depleted of a processive polymerase. As a corroborating control of that, the promoter region of the GAPDH gene that is actively and constitutively transcribed in these cells, was also analyzed. As shown in Fig. 2 (Fig. S4), thereby confirming that this region may represent a genuine pausing site for the polymerase during the transcription of Dp427m or Dp427b mRNAs. In contrast, the RNA polymerase was still present on Exon 62. Particularly, the enrichment was observed for Pol II and CTD-P-Ser5-Pol II but not for CTD-P-Ser2-, indicating that the region may be associated with a novel transcription start site. Indeed, both bioinformatics assembly of annotated ESTs (Supp. Fig. 5) and RT-PCR (Supp. Fig. 6) revealed the existence of a novel mRNA, which starts from the inner region of exon 62 with the same orientation of the Dp427

A C C E P T E D M A N U S C R I P T
15 transcripts. Bioinformatics analyses suggest that this is a non-coding RNA since no ORFs of adequate size were predicted.

Bioinformatics analysis of the DMI52 region
The DMEI52 pausing site was analyzed for the presence of putative transcription factor binding sites (TFBS). LASAGNA webtool [22] scored 228 Vertebrate TRANSFAC matrices ( and signaling proteins (ELK1, ATF2, JUN) (Table S4). This is consistent with the DMI52 function as pausing site in dystrophin gene, which is almost exclusively expressed in muscle (both striated and cardiac). Interestingly also circadian clock related proteins are represented in the Reactome pathways, reinforcing the link between muscle differentiation or regeneration and synchronization as already published. STRING analysis (http://string-db.org) [24][25][26] of network nodes involving the scored 37 human TFs identified experimentally determined interaction among the TFs supporting the bioinformatics analysis performed on DMI52 pausing site (Fig S7).

Function of DMI34 and DME45 regions.
Since in SJCRH30 cells, DMI34 and DME45 were strongly marked by H3K4me2, pan-acetylation and more precisely by acetylation of H3K27, a specific marker of enhancer elements (Fig. S8), we speculated that these regions might function as regulatory elements possibly with enhancer-like activities. To support this idea, the two DNA regions were separately cloned in both directions into a reporter vector downstream the luciferase reporter whose activity was driven by the upstream

A C C E P T E D M A N U S C R I P T
16 Dp427m promoter. Constructs were transiently transfected in SJCRH30 and murine C2C12 cells and luciferase activity was monitored at 24 hours from transfection. Results showed that DMI34 can stimulate the Dp427m promoter transcription when cloned in both directions and predominantly in the inverse one (Fig. 3A). In contrast, DME45 showed very low effect or at most had a negative impact on the transcription of the reporter particularly in the C2C12 cells (Fig. 3B). Constructs were tested in SJCRH30 and C2C12 cells 24h from transfection. A renilla reporter cotransfected with each Luc reporter was used to normalize luciferase activity. Results are the mean +/-SE of 3 independent transfections in triplicates. A t-student test was applied to determine statistically significant differences among tested conditions (* indicates p<0.05; n.s., non-significant) Although the luciferase assay was informative, results could not take into account the chromosomal distances between the DMI34/DME45 regions and the Dp427m promoter; it is known that several transcriptional enhancers can work several tens or hundred thousands bps from their target sites [27]. To demonstrate in vivo physical association of the intron-34/exon-45 regions with the Dp427m promoter we applied a Chromosome Conformation Capture (3C) assay on SJCRH30 and HeLa cells. Schemes of the experimental design of the assay are described in Figs. 4A and 4D.
Results show that DMI34 could generate a DNA PCR product specifically resulting from the natural juxtaposition of DMI34 with the Dp427m promoter (Fig. 4B). As expected no PCR product was observed in HeLa cells in which the Dp427m isoform is not expressed. (Fig. 4C). Furthermore, no PCR products were observed when DMI34 was tested for interaction with other in cis

A C C E P T E D M A N U S C R I P T
specific interactions were detected using specific Taqman probes. To estimate the frequency of random collision between digested fragments, two intra-chromosomal (ChX-p21 and ChX-q22) and one inter-chromosomal (Ch 21) interactions were analysed. As negative controls, not cross-linked cells were used. PCR products were then loaded on an agarose gel, each panel is representative of four independent experiments.

Consequences of intron 34 or exon 45 deletion in BMD patients.
To understand whether DMI34 and DME45 exert a critical role in the regulation of muscle dystrophin in patients affected by dystrophinopathies, we analyzed the amount of dystrophin protein and transcript in BMD. The dystrophin protein in these BMD patients is shorter, as consequence of in frame deletions and is therefore expressed in muscle preserving most of its normal functions [28].
Nine patients clinically diagnosed with BMD were selected and grouped according to deletions  Fig. S9. Patients' phenotypes, and how they were grouped, are listed in Table 1.

A C C E P T E D M A N U S C R I P T
20

A new DMD RNA pol II pausing site
In this study we have, for the first time, analyzed the dynamics by which the RNA pol II passes through the DMD locus in the context of specific histone marks. We identified a unique pausing site in intron 52 out of the 2,2 Mb encompassing the DMD locus, demonstrating that the RNA polymerase displays an extraordinary level of processivity in this huge locus, nevertheless recognizes the need to stop during its processivity. This is the first report of an internal pausing site in the dystrophin gene. Following transcription initiation, RNA pol II can enter into a paused or stalled status generally immediately downstream the transcription start site before starting a

A C C E P T E D M A N U S C R I P T
22 productive elongation. This is a widespread physiologically regulated phenomenon, though not completely understood, and RNA pol II reactivation occurs via a number of elongation complexes, [29].
RNA polymerase II pausing sites are indeed located just downstream of the promoter in a relevant proportion of human genes, and not in other regions. The DMI52 site is within an intron, positioned in the central part of the DMD gene. Although DMD has several promoters, DMI52 is not located in close proximity to any 3' DMD promoter regions, since Dp114 is driven by a promoter in intron 44 with the ATG codon in exon 51, whereas Dp140 has promoter and unique first exon in intron 55.
Regions annotated as RNA pol II pausing sites have some sequence characteristics as binding motifs for transcription factors, generally GC rich regions and presence of G4 (RNA Gquadruplexes) motifs, which are transcriptional and epigenetic regulatory targets of transcription factors. These last have been recently connected to human diseases as amyotrophic lateral sclerosis [30]. The RNA polymerase II pausing is tightly associated with pre-mRNA processing being both co-transcriptional processes [31]. Recently, a relevant role of RNA loops (R-loops) in facilitating RNA pol II pausing prior elongation has been pinpointed, and it is mediated by intense antisense transcription over the pausing elements [31]. The functional meaning of this DMD intron 52 pausing site is unknown, especially considering its unusual location. It might have a role in controlling the efficiency of transcription initiation and also in regulating alternative splicing. The DMD locus is alternatively spliced, and many lncRNAs are actively transcribed, especially around the region of the intron52 pausing site (from intron 45 to intro 55) [13]. Indeed, we describe here an additional new lncRNA within intron 62. We can hypothesize that this might be a crucial region for the dystrophin transcriptional dynamics regulation, requiring a pausing site and then restarting elongation downstream. Since R-loops also promote chromatin architecture shaping, which controls termination region, this region might also be implicated in gene transcription termination and polyadenylation regulation [32,33]. Very recently ultra-deep transcript sequencing analyses have shown that dystrophin pre-mRNA undergoes multi-step non-sequential splicing thus suggesting a highly regulated process in which control of RNA pol II processivity may be critical [34]. These authors described in deep detail the intron removal dynamic of the DMD gene identifying nonconsecutive intron removal and exon blocks, as result of 3 or more joined exons flanked by unspliced introns. Interestingly, the two blocks containing exons 50-52 and 53-57 are spliced non sequentially, with a recursive splicing occurring within intron 52. This may reinforce the regulatory role this intron 52 exerts in transcription dynamics.

A novel lncRNA.
The ChIP-on chip analyses also revealed the presence of a new promoter nearby exon 62 which drives trasncription of a lnc-RNA (lncRNA62int) expressed in rhabdomyosarcoma cells. We did not pick up lncRNA62int in our previous study [13], but it is possible that this new lncRNA might be more represented in rhabdomyosarcoma than in normal skeletal muscle, and therefore it can have escaped our previous analysis. Indeed, an alternatively spliced isoform of this lncRNA has been also annotated in a published chondrosarcoma RNA library suggesting that this type of lncRNA may be typically expressed in tumour tissues/cells. Again, this new lncRNA62int underlines the intense transcriptional activity, which occurs in this region of the DMD gene.

Intron 34 dystrophin enhancer (DMI34)
Our epigenetic analysis has identified two putative transcriptional cis-DNA elements that may Conversely, in vitro assays did not support a transcriptional regulatory role for DME45. BMD patients deleted in this region have lower levels of dystrophin protein as compared to muscle samples with deletion of exon 34. Of course, this variability could be due to the different extent of the deletion in these two groups (many other exons are indeed missing). Therefore, we cannot conclude about the possible function DME45 has, and further studies are required to elucidate its eventual role.
Our results highlight a profound complexity in the DMD gene epigenetic structure and unveil new transcriptional dynamics. We showed for the first time a genuine RNA pol II pausing site, interestingly not located adjacent to the promoter region.
We identified the new DME34 enhancer, which is interesting also for therapeutic implications.
Indeed, modulation of the DME34 may enhance DMD transcript production with beneficial effect in terms of protein amount production thereby also possible applicable to BMD patients, currently orphan of any specific treatment [7].
In conclusion, our findings have started to provide a global picture on how the entire DMD locus is epigenetically assembled and dynamically transcribed by the RNA pol II.
These findings are important for elucidating the basic mechanisms, the DMD gene follows when physiologically expressed, and may contribute to a better understanding of disease severity pathogenesis in both BMD and DMD patients.
A C C E P T E D M A N U S C R I P T 25

Acknowledgments
This study was supported by the FP7 EU BIO-NMD project n. 241665 (AF), by the Emilia Romagna Region RARER project (AF and AR), and by the University of Bologna (RFO2011-2013 to GP). The Biobank of the C. Besta Neurological Institute, member of EuroBioBank and Telethon Network of Genetic Biobanks (GTB12001F to MM) is gratefully acknowledged for providing biological samples.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.