BigBench workload executed by using Apache Flink

Bergamaschi, Sonia; Gagliardelli, Luca; Simonini, Giovanni; Zhu, Song

doi:10.1016/j.promfg.2017.07.169

Many of the challenges that have to be faced in Industry 4.0 involve the management and analysis of huge amount of data (e.g. sensor data management and machine-fault prediction in industrial manufacturing, web-logs analysis in e-commerce). To handle the so-called Big Data management and analysis, a plethora of frameworks has been proposed in the last decade. Many of them are focusing on the parallel processing paradigm, such as MapReduce, Apache Hive, Apache Flink. However, in this jungle of frameworks, the performance evaluation of these technologies is not a trivial task, and strictly depends on the application requirements. The scope of this paper is to compare two of the most employed and promising frameworks to manage big data: Apache Flink and Apache Hive, which are general purpose distributed platforms under the umbrella of the Apache Software Foundation. To evaluate these two frameworks we use the benchmark BigBench, developed for Apache Hive. We re-implemented the most significant queries of Apache Hive BigBench to make them work on Apache Flink, in order to be able to compare the results of the same queries executed on both frameworks. Our results show that Apache Flink, if it is configured well, is able to outperform Apache Hive.

BigBench workload executed by using Apache Flink / Bergamaschi, S., Gagliardelli, L., Simonini, G., Zhu, S.. - In: PROCEDIA MANUFACTURING. - ISSN 2351-9789. - 11:(2017), pp. 695-702. (27th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM) Modena, ITALY JUN 27-30, 2017) [10.1016/j.promfg.2017.07.169].

BigBench workload executed by using Apache Flink

BERGAMASCHI, Sonia;GAGLIARDELLI, LUCA;SIMONINI, GIOVANNI;ZHU, SONG

2017

Abstract

Many of the challenges that have to be faced in Industry 4.0 involve the management and analysis of huge amount of data (e.g. sensor data management and machine-fault prediction in industrial manufacturing, web-logs analysis in e-commerce). To handle the so-called Big Data management and analysis, a plethora of frameworks has been proposed in the last decade. Many of them are focusing on the parallel processing paradigm, such as MapReduce, Apache Hive, Apache Flink. However, in this jungle of frameworks, the performance evaluation of these technologies is not a trivial task, and strictly depends on the application requirements. The scope of this paper is to compare two of the most employed and promising frameworks to manage big data: Apache Flink and Apache Hive, which are general purpose distributed platforms under the umbrella of the Apache Software Foundation. To evaluate these two frameworks we use the benchmark BigBench, developed for Apache Hive. We re-implemented the most significant queries of Apache Hive BigBench to make them work on Apache Flink, in order to be able to compare the results of the same queries executed on both frameworks. Our results show that Apache Flink, if it is configured well, is able to outperform Apache Hive.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2017
			
	Titolo del Convegno
	
				27th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM)
			
	Luogo del Convegno
	
				Modena, ITALY
			
	Data del Convegno
	
				JUN 27-30, 2017
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.promfg.2017.07.169
			
	Codice WoS
	
				WOS:000419072100080
			
	Codice Scopus
	
				2-s2.0-85029830180
			
	Rivista
	
				PROCEDIA MANUFACTURING
			
	N° del Volume
	
				11
			
	Pagina iniziale
	
				695
			
	Pagina finale
	
				702
			
	Tutti gli autori
	
						Bergamaschi, Sonia; Gagliardelli, Luca; Simonini, Giovanni; Zhu, Song
					
	Citazione
	
				BigBench workload executed by using Apache Flink / Bergamaschi, S., Gagliardelli, L., Simonini, G., Zhu, S.. - In: PROCEDIA MANUFACTURING. - ISSN 2351-9789. - 11:(2017), pp. 695-702. (27th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM) Modena, ITALY JUN 27-30, 2017) [10.1016/j.promfg.2017.07.169].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S235197891730375X-main.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Licenza: [IR] creative-commons Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri	1.13 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris