Metodi computazionali per la ricostruzione delle interazioni intra-cellulari a partire da dati di single cell RNA-seq

Grandi, Francesco

Nowadays, Next Generation Sequencing (NGS) technologies allow to obtain and analyze the genomic, transcriptomic, and epigenetic structure of a biological sample at the single cell level (single-cell genomics). Recently, single-cell molecular profiling techniques have experienced a significant development and are generating high quality data with ever-increasing dimensions. scRNA-seq is to date the most advanced tool to investigate cell identity, fate, and function within tissues. Through the analysis of these data, it is possible to identify cells sharing the same transcriptional profile, classify them in their respective cellular types and study their evolutionary dynamics based on their distribution along pseudo-temporal trajectories. The single-cell data seems to be an excellent signal for the identification and the modelling of the associations between epigenomic structures, regulatory circuits, and the transcriptional result. If properly modelled, it may provide essential information on regulatory interactions and functions of the main molecular factors at the base of complex phenotypes, as well as leading the way to the discovery of the transcriptional determinants specifically active in each cell subpopulation. The reconstruction of these interdependencies relies on the implementation of innovative computational methods that can convert collections of single-cell signals into models of cell functions. Graphs and networks have been the most used mathematical frameworks to model gene regulatory interactions. In gene regulatory networks (GRN) nodes represent regulatory factors (such as transcription factors, TFs), and target genes, while edges indicate relationships (activating or inhibiting) between them. Their application to bulk data allowed the identification of the master transcriptional regulators of a given cell state; however, at the single cell level, the reconstruction of GRN is challenged by the scRNA-seq data intrinsic characteristics, such as their sparsity, dimensionality, and their variability. As reported in literature by recent benchmark studies, the performances of commonly used reconstruction methods are severely hampered when applied on simulated scRNA-seq data. Overall, the development of novel effective approaches for the identification of gene regulatory interactions starting from single-cell transcriptional data is pressing needed. Thus, the aim of my work has been the development of a new computational procedure able to model intra-cellular regulatory circuits starting from scRNA-seq data. It is called zI and it is based on Shannon's Entropy for discrete values and on expression matrix discretization. The software was developed in the Python programming language and has been benchmarked using several datasets. Compared to the current gold standards, the method showed comparable and sometimes even better performances on synthetic data, while confirming its effectiveness when applied to real scRNA-Seq data of different cellular contexts. Moreover, zI efficacy was established also by applying it, together with its main competitors, to the neoplastic cells of a Glioblastoma Multiforme (GBM) dataset. The reconstructed networks have been then compared to that obtained from the published composite pipeline "Rhabdomant", built on the same dataset. This comparison highlighted the zI ability to retrieve most of the main interactors, confirming its effectiveness in the identification of the main regulators of the gene expression.

Oggi le tecnologie per il sequenziamento massivo consentono di ottenere ed analizzare la struttura genomica, trascrizionale ed epigenetica di un campione biologico a livello di ogni singola cellula. Recentemente, le tecniche di profilazione molecolare a singola cellula hanno sperimentato un significativo sviluppo e stanno generando dati di altissima qualità con dimensioni sempre crescenti. Il scRNA-seq è ad oggi lo strumento più avanzato per studiare l'identità, il destino e la funzione delle cellule all'interno dei tessuti. Attraverso l'analisi di questi dati è possibile identificare cellule che condividono lo stesso profilo trascrizionale, classificarle nei rispettivi tipi cellulari e studiarne le dinamiche evolutive in base alla loro distribuzione lungo traiettorie pseudo-temporali. Il dato a singola cellula sembra rappresentare il segnale di eccellenza per l'identificazione e la ricostruzione delle associazioni tra strutture epigenomiche, circuiti regolatori e risultato trascrizionale. Se opportunamente modellato, questo dato può fornire informazioni essenziali sulle interazioni regolatorie e sulle funzioni dei principali fattori molecolari alla base di fenotipi complessi, nonché aprire la strada alla scoperta dei determinanti trascrizionali specificamente attivi in ciascuna sottopopolazione cellulare. La ricostruzione di queste interdipendenze si basa sull'implementazione di metodi computazionali innovativi in grado di convertire raccolte di segnali unicellulari in modelli di funzioni cellulari. Grafici e reti sono stati gli approcci matematici più utilizzati per modellare le interazioni regolatorie tra geni. Nelle reti di regolazione genica (GRN) i nodi rappresentano fattori regolatori (es. fattori di trascrizione, TF) e geni bersaglio, mentre gli archi indicano relazioni (attivatorie o inibitorie) tra questi. La loro applicazione a dati di interi tessuti ha consentito l'identificazione dei principali regolatori della trascrizione di un dato stato cellulare; tuttavia, a livello di singola cellula, la ricostruzione delle GRN è complicata dalle caratteristiche intrinseche dei dati di scRNA-seq, quali sparsità, dimensionalità e variabilità. Come riportato in recenti studi comparativi, le prestazioni dei metodi di ricostruzione comunemente usati sono gravemente ostacolate quando applicati su dati simulati di scRNA-seq. Nel complesso, è urgente lo sviluppo di nuovi approcci che siano efficaci nell'identificazione delle interazioni geniche regolatorie a partire da dati trascrizionali a singola cellula. Pertanto, l'obiettivo del mio lavoro è stato lo sviluppo di una nuova procedura computazionale in grado di modellare circuiti regolatori intracellulari a partire da dati di scRNA-seq. Il metodo prende il nome di zI ed è basato sull'entropia di Shannon per valori discreti e sulla discretizzazione della matrice di espressione. Il software è stato sviluppato in Python ed è stato applicato a numerosi set di dati. Rispetto agli attuali metodi di riferimento, zI ha mostrato prestazioni comparabili e, talvolta anche superiori, quando applicato a dati sintetici, ed ha confermato la sua efficacia quando applicato a dati reali di scRNA-Seq di diversi contesti cellulari. L'efficacia del metodo zI è stata ulteriormente provata applicandolo, assieme ad i principali concorrenti, alle cellule neoplastiche di un set di dati di Glioblastoma Multiforme (GBM). Le reti ricostruite sono state poi confrontate con quella ottenuta da una particolare metodologia composita, pubblicata sotto il nome di "Rhabdomant", sviluppata sullo stesso set di dati. Questo confronto ha messo in evidenza la capacità di zI di recuperare la maggior parte dei principali interattori, confermandone l'efficacia nell'identificazione dei principali regolatori dell'espressione genica.

Metodi computazionali per la ricostruzione delle interazioni intra-cellulari a partire da dati di single cell RNA-seq / Francesco Grandi , 2023 May 23. 35. ciclo, Anno Accademico 2021/2022.

Metodi computazionali per la ricostruzione delle interazioni intra-cellulari a partire da dati di single cell RNA-seq

GRANDI, FRANCESCO

2023

Abstract

Nowadays, Next Generation Sequencing (NGS) technologies allow to obtain and analyze the genomic, transcriptomic, and epigenetic structure of a biological sample at the single cell level (single-cell genomics). Recently, single-cell molecular profiling techniques have experienced a significant development and are generating high quality data with ever-increasing dimensions. scRNA-seq is to date the most advanced tool to investigate cell identity, fate, and function within tissues. Through the analysis of these data, it is possible to identify cells sharing the same transcriptional profile, classify them in their respective cellular types and study their evolutionary dynamics based on their distribution along pseudo-temporal trajectories. The single-cell data seems to be an excellent signal for the identification and the modelling of the associations between epigenomic structures, regulatory circuits, and the transcriptional result. If properly modelled, it may provide essential information on regulatory interactions and functions of the main molecular factors at the base of complex phenotypes, as well as leading the way to the discovery of the transcriptional determinants specifically active in each cell subpopulation. The reconstruction of these interdependencies relies on the implementation of innovative computational methods that can convert collections of single-cell signals into models of cell functions. Graphs and networks have been the most used mathematical frameworks to model gene regulatory interactions. In gene regulatory networks (GRN) nodes represent regulatory factors (such as transcription factors, TFs), and target genes, while edges indicate relationships (activating or inhibiting) between them. Their application to bulk data allowed the identification of the master transcriptional regulators of a given cell state; however, at the single cell level, the reconstruction of GRN is challenged by the scRNA-seq data intrinsic characteristics, such as their sparsity, dimensionality, and their variability. As reported in literature by recent benchmark studies, the performances of commonly used reconstruction methods are severely hampered when applied on simulated scRNA-seq data. Overall, the development of novel effective approaches for the identification of gene regulatory interactions starting from single-cell transcriptional data is pressing needed. Thus, the aim of my work has been the development of a new computational procedure able to model intra-cellular regulatory circuits starting from scRNA-seq data. It is called zI and it is based on Shannon's Entropy for discrete values and on expression matrix discretization. The software was developed in the Python programming language and has been benchmarked using several datasets. Compared to the current gold standards, the method showed comparable and sometimes even better performances on synthetic data, while confirming its effectiveness when applied to real scRNA-Seq data of different cellular contexts. Moreover, zI efficacy was established also by applying it, together with its main competitors, to the neoplastic cells of a Glioblastoma Multiforme (GBM) dataset. The reconstructed networks have been then compared to that obtained from the published composite pipeline "Rhabdomant", built on the same dataset. This comparison highlighted the zI ability to retrieve most of the main interactors, confirming its effectiveness in the identification of the main regulators of the gene expression.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
				Computational methods to reconstruct intra-cellular interactions from single cell RNA-seq data
			
	Anno di discussione
	
				23-mag-2023
			
	Tutor afferenti all'Ateneo
	
				VILLANI, Marco
BICCIATO, Silvio
			
	Tipologia
	
				Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
Thesis_Francesco Grandi.pdf embargo fino al 22/05/2026 Descrizione: Tesi definitiva Grandi Francesco Tipologia: Tesi di dottorato Dimensione 6.95 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	6.95 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris