Amplicon‐based next‐generation sequencing: an effective approach for the molecular diagnosis of epidermolysis bullosa

Epidermolysis bullosa (EB) is caused by mutations in genes that encode proteins belonging to the epidermal–dermal junction assembly. Due to the extreme clinical/genetic heterogeneity of the disease, the current methods available for diagnosing EB involve immunohistochemistry of biopsy samples and transmission electron microscopy followed by single‐candidate gene Sanger sequencing (SS), which are labour‐intensive and expensive clinical pathways.

Background Epidermolysis bullosa (EB) is caused by mutations in genes that encode proteins belonging to the epidermal-dermal junction assembly. Due to the extreme clinical/genetic heterogeneity of the disease, the current methods available for diagnosing EB involve immunohistochemistry of biopsy samples and transmission electron microscopy followed by single-candidate gene Sanger sequencing (SS), which are labour-intensive and expensive clinical pathways. Objectives According to the recently published recommendations for the diagnosis and treatment of EB, the assessment of the mutational landscape is now a fundamental step for developing a comprehensive diagnostic path. We aimed to develop a customized, cost-effective amplicon panel for the complete and accurate sequencing of all the pathogenic genes already identified in EB, and to minimize the processing time required for the execution of the test and to refine the analysis pipeline to achieve cost-effective results from the perspective of a routine laboratory set-up. Next-generation sequencing (NGS) via the parallel ultra-deep sequencing of many genes represents a proper method for reducing the processing time and costs of EB diagnostics. Materials and methods We developed an EB disease-comprehensive AmpliSeq panel to accomplish the NGS on an Ion Torrent Personal Genome Machine platform. The panel was performed on 10 patients with known genetic diagnoses and was then employed in eight family trios with unknown molecular footprints. Results The panel was successful in finding the causative mutations in all 10 patients with known mutations, fully confirming the SS data and providing proof of concept of the sensitivity, specificity and accuracy of this procedure. In addition to being consistent with the clinical diagnosis, it was also effective in the trios, identifying all of the variants, including ones that the SS missed or de novo mutations. Conclusions The NGS and AmpliSeq were shown to be an effective approach for the diagnosis of EB, resulting in a cost-and time-effective 72-h procedure.
What's already known about this topic?
• Skin microscopy and Sanger sequencing (SS) are valuable diagnostic tools for the diagnosis of epidermolysis bullosa (EB), although in some cases provide suboptimal specificity and sensitivity for an accurate diagnosis and are labour intensive and expensive.
• Whole-exome sequencing (WES) is an extremely effective technique able to refine and improve the diagnosis of EB, overcoming the limits of skin microscopy and SS.
• WES is a time-consuming procedure, is not yet cost-effective and needs particularly intensive data management.
What does this study add?
• Amplicon-based targeted next-generation sequencing (NGS) is an extremely accurate diagnostic tool.
• Amplicon-based targeted NGS meets the needs of medium-sized diagnostics laboratories, offering a very fast and cost-effective sample processing of EB.
• Amplicon-based NGS can be fully disease-customized and simplifies data interrogation managing.
Inherited epidermolysis bullosa (EB) is a family of rare genetic skin disorders characterized by structural and mechanical fragility of the skin and mucosal membranes. The primary feature of EB is the presence of recurrent skin blistering or erosions, which has a profound impact on the quality of life of patients with EB and in the most severe forms, cause early death. [1][2][3][4] The overall prevalence of EB in the population is estimated to be 0Á8 per 100 000 individuals in the United States and 1 per 100 000 in Italy 5,6 (http://www.snlg-iss.it/lgmr_ epidermolisi_bollose; www.orphanet.com).
EB is caused by autosomal dominant or recessive mutations in the genes encoding the different proteins responsible for the assembly of the epidermal-dermal junction, and is classified into four major types: EB simplex (EBS), junctional EB (JEB), dystrophic EB (DEB) and Kindler syndrome. 4 The classification of EB is particularly complex due to the extreme clinical and genetic heterogeneity of the disease. Furthermore, each major type of EB can be further divided into specific subtypes based on the ultrastructural features, genetics, mode of inheritance and clinical manifestations. 4 Currently, the molecular diagnosis of EB represents the final phase of a labour-intensive, expensive and time-consuming clinical pathway, which includes the gathering of a detailed personal and family history and subtype classification via immunofluorescence mapping and/or transmission electron microscopy, performed on skin biopsies harvested from spontaneous or traction-induced blisters. The high loci and genetic heterogeneity (there are hundreds of known mutations in different genes) together with the very high costs for DNA Sanger sequencing (SS) have made it difficult to justify the use of such molecular analyses in the absence of a well-structured medical case (http://www.snlg-iss.it/lgmr_epidermolisi_bollose).
The intrinsic limitations of the conventional genetic diagnosis of EB could be overcome by the progress achieved by the current next-generation sequencing (NGS) technologies, which through the parallel sequencing of a panel of genes (instead of single candidate genes) would allow improvements in the processing time, cost and accuracy of the diagnosis and the identification of the genetic factors that influence the clinical expression of patients with EB.
The first goal of this study was to develop a customized, cost-effective amplicon panel (AmpliSeq) for the complete and accurate sequencing of all the pathogenic genes already identified in EB. The NGS panel was therefore handled using a semi-automated procedure for the library preparation and sequencing on an Ion Torrent Personal Genome Machine (PGM) platform: the first phase was performed on a small cohort of 10 patients (with known molecular and genetic diagnoses) to obtain proof of concept of the sensitivity, specificity and accuracy of this type of NGS procedure, and the results were then compared with the results obtained via SS.
The second phase consisted of eight family trios (proband, mother and father) with unknown molecular footprints to validate the massive parallel customized AmpliSeq panel on either EB-affected or healthy carriers; this second phase of work aimed to minimize the processing time required for the execution of the test and to refine the analysis pipeline to achieve cost-effective results from the perspective of a routine laboratory set-up.

Sample collection and DNA extraction
Patients previously diagnosed with EB by Italian or European Institutions were enrolled under informed consent with the support of DEBRA S€ udtirol-Alto Adige (www.debra.it).
The clinical data collection (when available), anonymity, storage, utilization, sample maintenance and informed consent procedures followed the indications reported by the Garante per la Protezione dei dati Personali, which were recently summarized in the 'Autorizzazione generale al trattamento dei dati genetici' (Gazzetta Ufficiale n.159, 11 Luglio 2011), the Autorizzazione generale 24/6/11 and in the Helsinki Declaration of 1975, as revised in 1983. The informed consent was elaborated according to the guidelines reported above.
The genomic DNA (gDNA) was extracted from fresh peripheral blood samples using the QIAGEN Blood and Tissue kit (Qiagen; Redwood, CA, U.S.A.). The DNA sample quantity and the purity of the nucleic acid samples were assessed using a Nanodrop 1000 instrument (Thermo Fisher Scientific; Freemont, CA, U.S.A.). All of the gDNAs had 260/280-nm absorbance ratios between 1Á8 and 2Á0 and 260/230-nm ratios in the range of 2Á0-2Á2.

AmpliSeq panel design and Ion Torrent Personal Genome Machine platform sequencing
The amplicon-sequencing analyses took advantage of the Ion AmpliSeq TM technology (Life Technologies Ltd; Paisley, U.K.), an ultra-high-multiplex polymerase chain reaction (PCR) amplification strategy that uses very low input gDNA for a simple and fast library construction for the affordable sequencing of specific human genes or genomic regions. A custom panel was designed with the help of the online Designer tool (https://www.ampliseq.com/browse.action), which was employed to generate optimized primer designs encompassing the coding DNA sequences with 5 bp of exon padding and the untranslated regions (UTRs) of 34 genes in the Human Reference Genome (hg19); these genes were selected from the NCBI databases or the Ingenuity TM (www.ingenuity.com) knowledge base as those described in EB or clinically related diseases (Table S1).
The primer pairs were divided into two pools to optimize the coverage and multiplex PCR conditions. The overall coverage rate was 95Á43%. The customized Ion AmpliSeq panel was processed using the Ion AmpliSeq Library Kit 2.0 (Life Technologies Ltd) according to the manufacturer's recommendations, starting from 10 ng of gDNA/pool. The samples were barcoded with the Ion Express Barcode Kit (Life Technologies Ltd) to optimize the patient pooling on the same sequencing chip. The template preparation was performed using an Ion One Touch 2 System and an Ion One Touch ES following the latest version of the manufacturer's manuals. The templatepositive Ion Sphere Particles (ISP+) were sequenced on a PGM (Life Technologies Ltd) using the 318 Chip v2 following the Ion PGM Sequencing 200 Kit v2 manual (Life Technologies Ltd). The average raw per base accuracy was 99Á3%. Concerning homopolymers, per base accuracy of the sequencing platform is greater than 97Á5% for stretches up to 5-bp long (Life Technologies Ltd, application note, 2011, CO31484).

Variant detection
The variant detection, annotation and filtering were performed using an analysis pipeline that was generated and tested for functionality in our laboratory and setup for EB diagnosis. All the databases and software used were the latest available versions. The samples were processed using the Ion Torrent Variant Caller (www.lifetechnologies.com) for the variant detection. ANNOVAR 7 was used to retrieve the gene annotation from RefSeq, the rs identifiers from dbSNP, the allele frequency from the 1000 Genomes Project and the Exome Sequencing Project esp6500, and the functional predictions from dbNSFP v2.1, which compiles the prediction scores from six prediction algorithms: SIFT, Polyphen2, LRT, MutationTaster, Muta-tionAssessor and FATHMM. The variants were further annotated using the Variant Effect Predictor, the COL7A1 mutation database, 8 the Human Intermediate Filament Database (www.interfil.org), 9 and the Leiden Open Variation Database (LOVD, http://www.lovd.nl/). 10 Finally, the splicing defect predictions were interrogated with NNSPLICE (www.fruitfly.org) 11 and dbscSNV, which is a database that compiles two ensemble prediction scores for all the potential human single-nucleotide variants within splicing consensus regions (À3 to +8 at the 5 0 splice site and À12 to +2 at the 3 0 splice site). Both the ensemble scores (ada_score | rf_score) are the probabilities of a variant being splice altered (> 0Á6 optimal cut-off point). 12 The variants with low coverage or low allele burden (< 50 reads or < 30%, respectively) were filtered out. The synonymous variants and variants having an allele frequency greater than 1% reported in 1000 genomes or esp6500 were discarded as well. The information from the mother-father-proband trio was used to further filter the proband variants to identify the causative mutations. Therefore, the variants present in homozygosis in at least one of the nonaffected parents were filtered out. Similarly, all the noncongruent variants with any possible inheritance model (de novo, dominant, recessive, compound heterozygosis) were discarded.

Results
The work was organized in two phases. In the first phase, the EB AmpliSeq custom panel was tested on 10 patients with known molecular diagnoses: one EBS, four JEB and five DEB cases. The sequencing on a 318 Ion Chip followed the equimolar pooling of the samples. In compliance with the amplicon Ion Torrent guidelines for Mendelian-inherited diseases, the mean sequencing depth of the run was 276Á7 (SD AE 31Á6), and the mean uniformity was 94Á6% (SD AE 1Á06); 96% of the amplicons had a coverage of > 50 reads, providing a sufficient number of reads to detect the variant with statistical confidence.
This first part of the work allowed the generation of an analysis pipeline for data filtering and annotation of the germline variants, which was propaedeutic to the second phase of the study. Ion Torrent Variant Caller was employed for the variant detection, and the resulting variants were filtered for a minimum coverage and a minimum allele burden as described in the Materials and methods section. Moreover, the COL7 mutation database, the Human Intermediate Filament Database, and the LOVD were used to explore additional annotation and literature information, if present. Finally, NNsplice and dbscSNV were used to predict whether a variant falling near a splice site could lead to splicing defects.
Of the 10 patients with known mutations, the AmpliSeq panel was successful in finding the causative mutations in 100% of them, fully confirming the SS data and resulting in a time-effective 72-h procedure (Table 1). Of note, the NGS analysis, while confirming the COL7A1 mutations in patient EB#10 (who was originally diagnosed with DEB) also deepened the results, showing two additional missense mutations in the LAMA3 gene, which are usually implicated in JEB. Further analyses in the maternal and paternal gDNA confirmed the inheritance model, as shown in Table 2.
Based on these data, the second part of the work was focused on the NGS amplicon sequencing of the germline samples from eight patient trios (proband, mother, father) that received a clinical diagnosis of EB but did not include comprehensive molecular testing data via SS ( Table 3).
The mean sequencing depth of the trio runs was successfully assessed at 331Á1 (SD AE 49), while the mean uniformity was 93Á7% (SD AE 1Á06). In addition, for these 24 samples, 95% of the amplicons had a coverage of > 50 reads.
The in-house analysis pipeline, which was set in the first phase, allowed us to identify the causative mutations in the trios, including any de novo mutations ( Table 3). The sequencing results confirmed the clinical diagnosis (and eventually the partial DNA testing) that the patients received.
In family EB#F7, the probable causative father-inherited mutation was found in the proband with a low coverage (allele burden of 16Á5%), but it was not confirmed in the paternal sample. This mutation falls in intron 87 of the COL7A1 gene, which is 61 bp upstream of the beginning of the fol-lowing exon (c.6901-61C>G) and is outside the sequence included in the amplicon panel.
Two noteworthy families were EB#F8 and EB#F6. EB#F8's proband was diagnosed with JEB, and the SS revealed a heterozygous mutation (R635X) in the LAMB3 gene and confirmed that the mutation was maternal, but no paternal mutations were found on either the DNA coding sequences or the mRNA from the cultured keratinocytes. The NGS sequencing validated the maternal stop-gain mutation and identified a second heterozygous variant in the LAMB3 gene: a C/T mutation at the splice-site sequence of the 5 0 UTR (c.-38 + 1G>A). According to the NNSPLICE prediction, this mutation may cause a break at the splice donor site of the 5 0 UTR of the gene, and dbscSNV strengthened this hypothesis by showing high probabilities of this variant being splice-altered (ada_score and rf_score of 0Á985 and 0Á856, respectively). The same heterozygous mutation was confirmed via NGS in the paternal gDNA, as shown in Table 3.
With regard to the EB#F6 family, the proband was reported to have only an ascertained (paternal) c.7344G>A variant in the COL7A1 gene. The NGS confirmed the paternal variant as altering the splicing and generating a premature termination codon, 13 and it also recognized a maternal mutation at the splice donor site in intron 90 (c.7023 + 1G>A, dbscSNV scores of 0Á99|0Á938), offering an achievable explanation of the inheritance model and the onset of DEB in this patient. These results are summarized, along with those of the other medical cases, in Table 3.

Discussion
This AmpliSeq NGS approach, which simultaneously sequences a comprehensive disease-related gene panel, provides many advantages, among them being the ability to depict a more complete diagnostic scenario. For example, in EB#10, the approach confirmed the COL7A1 mutations, but it also provided evidence of two additional missense mutations in the LAMA3 gene, revealing a possible, more articulated EB pathogenic onset. Nevertheless, it still remains to be established whether these laminin alpha 3 variants can modulate the disease activity of the EB subtype and alter the clinical presentation of the patient, i.e. if they somewhat influence the genotype-phenotype correlation. Moreover, the AmpliSeq NGS approach was demonstrated to be highly informative for both confirming and identifying EB pathogenic mutations. In one of the family trios (EB#F8), the SS revealed a heterozygous mutation (R635X) in the LAMB3 gene 14-16 but showed no paternal mutation. The NGS sequencing validated the maternal stop-gain mutation and identified a second heterozygous variant in the LAMB3 gene: a C/T mutation that may cause a break at the splice donor site of the 5 0 UTR of the gene. Interestingly, the AmpliSeq sequencing confirmed the findings of the transmission model, showing the same heterozygous mutation in the paternal gDNA. Remarkable results were also obtained for EB#F6, for which the SS reported a paternal missense mutation in the COL7A1 gene. 13,17 The NGS confirmed the paternal variant and also recognized the missing maternal variant (c.7023 + 1G>A) at a splice donor site, offering a complete diagnosis and inheritance model for this patient with DEB.
According to the recently published recommendations for the diagnosis and treatment of EB, 4 the assessment of the mutational landscape of patients with EB is a recommended step for a comprehensive diagnostic path; it actually represents the most precise method for ascertaining the EB subtype and for determining the transmission pattern of the disease. It is fundamental to establish whether the EB represents a sporadic event in the family, defining the reproductive risk for both the parents and the patient. Moreover, in a hopefully not too distant future, it will represent the fundamental prerequisite for a therapeutic approach advanced by gene or protein replacement or other cell-based therapies. [18][19][20][21][22][23] Recently a whole-exome sequencing (WES) approach showed brilliant results for identifying mutations that remained elusive using the current SS method. 24 Even if WES is demonstrated to be helpful in that regard, some nontrivial points remain: it is a time-consuming procedure; it is not yet cost effective; and it requires intensive data management. 25 Because a rapid diagnosis is often very important for optimizing clinical management and affordable costs are a prerequisite for the dissemination of any type of diagnostic procedure (particularly in those countries where this kind of     screening is offered as a public health service), we decided to choose an NGS approach that might address these EB requirements, so we designed an amplicon panel to be run on a low-priced medium-throughput NGS platform, such as PGM. The adoption of an amplicon panel instead of a WES procedure was intended to respond to the need of routine clinical molecular genetics laboratories for a fast sample processing and data interrogation method. Moreover, an Ampli-Seq panel offers the advantage of being fully customizable in terms of the coding and UTR sequences as well as the exon padding length (5 bp at the time we designed this EB panel and up to 100 bp in the current Ion Torrent pipeline version). A limited but disease-comprehensive number of genes (34 in this EB panel) makes it technically possible, even in a medium-throughput workflow molecular genetics laboratory, to reach a mean sequencing depth sufficient to detect variants with statistical confidence, in other words, to attain the deep sequencing that is mandatory in a diagnostic procedure. Alongside it has to be noted that if one patient should present no mutations in either of the panel's genes, WES would represent the next essential step to identify any possible pathogenetic mutation. With these points in mind, this method is capable of performing a single sequencing run (managed by one operator on a 318 Ion Chip), including the bioinformatics, in 72 h, producing results for up to nine patients (or three family trios). The all-inclusive single-patient costs are approximately 350 Euros (2014 prices). The expertise of the clinician specialist will remain, of course, essential to weigh up the molecular fingerprinting results and will be critical for accurate genetic counselling and patient management; what is more, an informative testing like the one we describe would be a valuable tool for the gene/ disease clinical expert for developing a consistent and complete database, populated with specific clinical and genetic data. This could facilitate the understanding of the considerable variation that exists in the severity of this disease because of the influence of modifying genetic factors, i.e. it could improve the comprehension of the genotype-phenotype correlation.
Based on these preliminary results, the EB AmpliSeq panel can still be improved: the enzymes in the new Ion PGM Hi-Q sequencing technology, now available, promise a further improvement in the sequencing accuracy and also in homopolymeric stretches, which are particularly abundant in the collagens' coding sequences. The elevated frequency of the mutations in the intron/exon boundaries observed in our cohort suggests redefining the custom panel, including an exon padding of at least 100 bp. Additional genes reported in the literature, such as DSC3, JUP and DST, 4,26,27 may also be included to refine the panel which actually is half of its amplicons' maximum potential. Developing a database populated with specific clinical and genetic data would help in improving the variant annotation and therefore in identifying the optimal candidates to target for gene therapy in EB.