Pushing ML Predictions into DBMSs

Paganelli, M.; Sottovia, P.; Park, K.; Interlandi, M.; Guerra, F.

doi:10.1109/TKDE.2023.3269592

In the past decade, many approaches have been suggested to execute ML workloads on a DBMS. However, most of them have looked at in-DBMS ML from a training perspective, whereas ML inference has been largely overlooked. We think that this is an important gap to fill for two main reasons: (1) in the near future, every application will be infused with some sort of ML capability; (2) behind every web page, application, and enterprise there is a DBMS, whereby in-DBMS inference is an appealing solution both for efficiency (e.g., less data movement), performance (e.g., cross-optimizations between relational operators and ML) and governance. In this paper, we study whether DBMSs are a good fit for prediction serving. We introduce a technique for translating trained ML pipelines containing both featurizers (e.g., one-hot encoding) and models (e.g., linear and tree-based models) into SQL queries, and we compare in-DBMS performance against popular ML frameworks such as Sklearn and ml.net. Our experiments show that, when pushed inside a DBMS, trained ML pipelines can have performance comparable to ML frameworks in several scenarios, while they perform quite poorly on text featurization and over (even simple) neural networks.

Pushing ML Predictions into DBMSs / Paganelli, M., Sottovia, P., Park, K., Interlandi, M., Guerra, F.. - In: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. - ISSN 1041-4347. - 35:10(2023), pp. 10295-10308. [10.1109/TKDE.2023.3269592]

Pushing ML Predictions into DBMSs

Paganelli M.;Sottovia P.;Park K.;Interlandi M.;Guerra F.

2023

Abstract

In the past decade, many approaches have been suggested to execute ML workloads on a DBMS. However, most of them have looked at in-DBMS ML from a training perspective, whereas ML inference has been largely overlooked. We think that this is an important gap to fill for two main reasons: (1) in the near future, every application will be infused with some sort of ML capability; (2) behind every web page, application, and enterprise there is a DBMS, whereby in-DBMS inference is an appealing solution both for efficiency (e.g., less data movement), performance (e.g., cross-optimizations between relational operators and ML) and governance. In this paper, we study whether DBMSs are a good fit for prediction serving. We introduce a technique for translating trained ML pipelines containing both featurizers (e.g., one-hot encoding) and models (e.g., linear and tree-based models) into SQL queries, and we compare in-DBMS performance against popular ML frameworks such as Sklearn and ml.net. Our experiments show that, when pushed inside a DBMS, trained ML pipelines can have performance comparable to ML frameworks in several scenarios, while they perform quite poorly on text featurization and over (even simple) neural networks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Rivista
	
				IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
			
	N° del Volume
	
				35
			
	Fascicolo
	
				10
			
	Pagina iniziale
	
				10295
			
	Pagina finale
	
				10308
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TKDE.2023.3269592
			
	Codice WoS
	
				WOS:001068964300036
			
	Codice Scopus
	
				2-s2.0-85159724035
			
	Codice PubMed
	
				37954972
			
	Citazione
	
				Pushing ML Predictions into DBMSs / Paganelli, M., Sottovia, P., Park, K., Interlandi, M., Guerra, F.. - In: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. - ISSN 1041-4347. - 35:10(2023), pp. 10295-10308. [10.1109/TKDE.2023.3269592]
			
	Tutti gli autori
	
						Paganelli, M.; Sottovia, P.; Park, K.; Interlandi, M.; Guerra, F.
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Pushing_ML_Predictions_into_DBMSs.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Licenza: [IR] creative-commons Dimensione 3.52 MB Formato Adobe PDF Visualizza/Apri	3.52 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris