Predicting metal-binding sites from protein sequence

Passerini, Andrea; Lippi, Marco; Frasconi, Paolo

doi:10.1109/TCBB.2011.94

Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds. © 2011 IEEE.

Predicting metal-binding sites from protein sequence / Passerini, Andrea; Lippi, Marco; Frasconi, Paolo. - In: IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. - ISSN 1545-5963. - 9:1(2012), pp. 203-213. [10.1109/TCBB.2011.94]

Predicting metal-binding sites from protein sequence

Passerini, Andrea;LIPPI, MARCO;Frasconi, Paolo

2012

Abstract

Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds. © 2011 IEEE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2012
			
	Rivista
	
				IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
			
	N° del Volume
	
				9
			
	Fascicolo
	
				1
			
	Pagina iniziale
	
				203
			
	Pagina finale
	
				213
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TCBB.2011.94
			
	Codice WoS
	
				WOS:000296782200017
			
	Codice Scopus
	
				2-s2.0-81455132670
			
	Codice PubMed
	
				21606549
			
	Citazione
	
				Predicting metal-binding sites from protein sequence / Passerini, Andrea; Lippi, Marco; Frasconi, Paolo. - In: IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. - ISSN 1545-5963. - 9:1(2012), pp. 203-213. [10.1109/TCBB.2011.94]
			
	Tutti gli autori
	
						Passerini, Andrea; Lippi, Marco; Frasconi, Paolo
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
ieeetccb11.pdf Accesso riservato Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 916.77 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	916.77 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris