Background The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of "irrelevant" base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identication and successive combination of simple motifs. Results We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it nds all the motifs conforming to the specications. It does so in two stages: rst it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benets of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a signicant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior. Conclusions A reection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.

Direct vs 2-stage approaches to structured motif finding / FEDERICO, Maria; LEONCINI, Mauro; MONTANGERO, Manuela; VALENTE, Paolo. - In: ALGORITHMS FOR MOLECULAR BIOLOGY. - ISSN 1748-7188. - ELETTRONICO. - 7:20:(2012), pp. N.A.-N.A.. [10.1186/1748-7188-7-20]

Direct vs 2-stage approaches to structured motif finding

FEDERICO, Maria;LEONCINI, Mauro;MONTANGERO, Manuela;VALENTE, Paolo
2012

Abstract

Background The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of "irrelevant" base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identication and successive combination of simple motifs. Results We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it nds all the motifs conforming to the specications. It does so in two stages: rst it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benets of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a signicant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior. Conclusions A reection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.
7:20
N.A.
N.A.
Direct vs 2-stage approaches to structured motif finding / FEDERICO, Maria; LEONCINI, Mauro; MONTANGERO, Manuela; VALENTE, Paolo. - In: ALGORITHMS FOR MOLECULAR BIOLOGY. - ISSN 1748-7188. - ELETTRONICO. - 7:20:(2012), pp. N.A.-N.A.. [10.1186/1748-7188-7-20]
Federico, Maria; Leoncini, Mauro; Montangero, Manuela; Valente, Paolo
File in questo prodotto:
File Dimensione Formato  
DirectVS2Stage.pdf

accesso aperto

Tipologia: Post-print dell'autore (bozza post referaggio)
Dimensione 627.45 kB
Formato Adobe PDF
627.45 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11380/812717
Citazioni
  • ???jsp.display-item.citation.pmc??? 3
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact