Computer systems have grown in complexity to the point where manual inspection of system behaviour for purposes of malfunction detection have become unfeasible. As these systems output voluminous logs of their activity, machine led analysis of them is a growing need with already several existing solutions. These largely depend on having hand-crafted features, require raw log preprocessing and feature extraction or use supervised learning necessitating having a labeled log dataset not always easily procurable. We propose a two part deep autoencoder model with LSTM units that requires no hand-crafted features, no preprocessing of data as it works on raw text and outputs an anomaly score for each log entry. This anomaly score represents the rarity of a log event both in terms of its content and temporal context. The model was trained and tested on a dataset of HDFS logs containing 2 million raw lines of which half was used for training and half for testing. While this model cannot match the performance of a supervised binary classifier, it could be a useful tool as a coarse filter for manual inspection of log files where a labeled dataset is unavailable.

Anomaly detection from log files using unsupervised deep learning / Bursic, S.; Cuculo, V.; D'Amelio, A.. - 12232:(2020), pp. 200-207. (Intervento presentato al convegno World Congress on Formal Methods tenutosi a Porto, Portugal nel 2019) [10.1007/978-3-030-54994-7_15].

Anomaly detection from log files using unsupervised deep learning

Cuculo V.;
2020

Abstract

Computer systems have grown in complexity to the point where manual inspection of system behaviour for purposes of malfunction detection have become unfeasible. As these systems output voluminous logs of their activity, machine led analysis of them is a growing need with already several existing solutions. These largely depend on having hand-crafted features, require raw log preprocessing and feature extraction or use supervised learning necessitating having a labeled log dataset not always easily procurable. We propose a two part deep autoencoder model with LSTM units that requires no hand-crafted features, no preprocessing of data as it works on raw text and outputs an anomaly score for each log entry. This anomaly score represents the rarity of a log event both in terms of its content and temporal context. The model was trained and tested on a dataset of HDFS logs containing 2 million raw lines of which half was used for training and half for testing. While this model cannot match the performance of a supervised binary classifier, it could be a useful tool as a coarse filter for manual inspection of log files where a labeled dataset is unavailable.
2020
World Congress on Formal Methods
Porto, Portugal
2019
12232
200
207
Bursic, S.; Cuculo, V.; D'Amelio, A.
Anomaly detection from log files using unsupervised deep learning / Bursic, S.; Cuculo, V.; D'Amelio, A.. - 12232:(2020), pp. 200-207. (Intervento presentato al convegno World Congress on Formal Methods tenutosi a Porto, Portugal nel 2019) [10.1007/978-3-030-54994-7_15].
File in questo prodotto:
File Dimensione Formato  
Bursic2020_Chapter_AnomalyDetectionFromLogFilesUs.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 393.07 kB
Formato Adobe PDF
393.07 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1300648
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 26
  • ???jsp.display-item.citation.isi??? ND
social impact