Medical diagnostics faced numerous difficulties during the COVID-19 pandemic. One of these has been the need for ongoing monitoring of SARS-CoV-2 mutations. Genomics is the technique most frequently used for precisely identifying variants. The ongoing global gathering of RNA samples of the virus has made such an approach possible. Nevertheless, variant identification techniques are frequently resource-intensive. As a result, the diagnostic capability of small medical laboratories might not be sufficient. In this work, an effective deep learning strategy for identifying SARS-CoV-2 variants is presented. This work makes two contributions: (1) a fine-tuning architecture of Bidirectional Encoder Representations from Transformers (BERT) to identify SARS-CoV-2 variants; (2) providing biological insights by exploiting BERT self-attention. Such an approach enables the analysis of the S gene of the virus to quickly recognize its variant. The selected model BERT is a transformer-based neural network first developed for natural language processing. Nonetheless, it has been effectively used in numerous applications, such as genomic sequence analysis. Thus, the fine-tuning of BERT was performed to adapt it to the RNA sequence domain, achieving a 98.59% F1-score on test data: it was successful in identifying variants circulating to date. The interpretability of the model was examined, since BERT utilizes the self-attention mechanism. In fact, it was discovered that by attending particular areas of the S gene, BERT extracts pertinent biological information on variants. Thus, the presented approach allows obtaining insights into the particular characteristics of SARS-CoV-2 RNA samples.

BERT Classifies SARS-CoV-2 Variants / Ghione, G.; Lovino, M.; Ficarra, E.; Cirrincione, G.. - 360:(2023), pp. 157-163. [10.1007/978-981-99-3592-5_15]

BERT Classifies SARS-CoV-2 Variants

Lovino M.;Ficarra E.;
2023

Abstract

Medical diagnostics faced numerous difficulties during the COVID-19 pandemic. One of these has been the need for ongoing monitoring of SARS-CoV-2 mutations. Genomics is the technique most frequently used for precisely identifying variants. The ongoing global gathering of RNA samples of the virus has made such an approach possible. Nevertheless, variant identification techniques are frequently resource-intensive. As a result, the diagnostic capability of small medical laboratories might not be sufficient. In this work, an effective deep learning strategy for identifying SARS-CoV-2 variants is presented. This work makes two contributions: (1) a fine-tuning architecture of Bidirectional Encoder Representations from Transformers (BERT) to identify SARS-CoV-2 variants; (2) providing biological insights by exploiting BERT self-attention. Such an approach enables the analysis of the S gene of the virus to quickly recognize its variant. The selected model BERT is a transformer-based neural network first developed for natural language processing. Nonetheless, it has been effectively used in numerous applications, such as genomic sequence analysis. Thus, the fine-tuning of BERT was performed to adapt it to the RNA sequence domain, achieving a 98.59% F1-score on test data: it was successful in identifying variants circulating to date. The interpretability of the model was examined, since BERT utilizes the self-attention mechanism. In fact, it was discovered that by attending particular areas of the S gene, BERT extracts pertinent biological information on variants. Thus, the presented approach allows obtaining insights into the particular characteristics of SARS-CoV-2 RNA samples.
2023
Smart Innovation, Systems and Technologies
9789819935918
9789819935925
Springer Science and Business Media Deutschland GmbH
BERT Classifies SARS-CoV-2 Variants / Ghione, G.; Lovino, M.; Ficarra, E.; Cirrincione, G.. - 360:(2023), pp. 157-163. [10.1007/978-981-99-3592-5_15]
Ghione, G.; Lovino, M.; Ficarra, E.; Cirrincione, G.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1333848
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact