Anomaly detection from log files using unsupervised deep learning

Bursic, S.; Cuculo, V.; D'Amelio, A.

doi:10.1007/978-3-030-54994-7_15

Computer systems have grown in complexity to the point where manual inspection of system behaviour for purposes of malfunction detection have become unfeasible. As these systems output voluminous logs of their activity, machine led analysis of them is a growing need with already several existing solutions. These largely depend on having hand-crafted features, require raw log preprocessing and feature extraction or use supervised learning necessitating having a labeled log dataset not always easily procurable. We propose a two part deep autoencoder model with LSTM units that requires no hand-crafted features, no preprocessing of data as it works on raw text and outputs an anomaly score for each log entry. This anomaly score represents the rarity of a log event both in terms of its content and temporal context. The model was trained and tested on a dataset of HDFS logs containing 2 million raw lines of which half was used for training and half for testing. While this model cannot match the performance of a supervised binary classifier, it could be a useful tool as a coarse filter for manual inspection of log files where a labeled dataset is unavailable.

Anomaly detection from log files using unsupervised deep learning / Bursic, S.; Cuculo, V.; D'Amelio, A.. - 12232:(2020), pp. 200-207. ( 3rd World Congress on Formal Methods (FM) Porto, Portugal OCT 07-11, 2019) [10.1007/978-3-030-54994-7_15].

Anomaly detection from log files using unsupervised deep learning

Bursic S.;Cuculo V.;D'Amelio A.

2020

Abstract

Computer systems have grown in complexity to the point where manual inspection of system behaviour for purposes of malfunction detection have become unfeasible. As these systems output voluminous logs of their activity, machine led analysis of them is a growing need with already several existing solutions. These largely depend on having hand-crafted features, require raw log preprocessing and feature extraction or use supervised learning necessitating having a labeled log dataset not always easily procurable. We propose a two part deep autoencoder model with LSTM units that requires no hand-crafted features, no preprocessing of data as it works on raw text and outputs an anomaly score for each log entry. This anomaly score represents the rarity of a log event both in terms of its content and temporal context. The model was trained and tested on a dataset of HDFS logs containing 2 million raw lines of which half was used for training and half for testing. While this model cannot match the performance of a supervised binary classifier, it could be a useful tool as a coarse filter for manual inspection of log files where a labeled dataset is unavailable.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo del Convegno
	
				3rd World Congress on Formal Methods (FM)
			
	Luogo del Convegno
	
				Porto, Portugal
			
	Data del Convegno
	
				OCT 07-11, 2019
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-54994-7_15
			
	Codice WoS
	
				WOS:001340715900015
			
	Codice Scopus
	
				2-s2.0-85089722040
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				12232
			
	Pagina iniziale
	
				200
			
	Pagina finale
	
				207
			
	Tutti gli autori
	
						Bursic, S.; Cuculo, V.; D'Amelio, A.
					
	Citazione
	
				Anomaly detection from log files using unsupervised deep learning / Bursic, S.; Cuculo, V.; D'Amelio, A.. - 12232:(2020), pp. 200-207. ( 3rd World Congress on Formal Methods (FM) Porto, Portugal OCT 07-11, 2019) [10.1007/978-3-030-54994-7_15].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
Bursic2020_Chapter_AnomalyDetectionFromLogFilesUs.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 393.07 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	393.07 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris