With the democratization of data science libraries and frame- works, most data scientists manage and generate their data analytics pipelines using a collection of scripts (e.g., Python, R). This marks a shift from traditional applications that communicate back and forth with a DBMS that stores and manages the application data. While code debuggers have reached impressive maturity over the past decades, they fall short in assisting users to explore data-driven what-if sce- narios (e.g., split the training set into two and build two ML models). Those scenarios, while doable programmati- cally, are a substantial burden for users to manage them- selves. Dagger (Data Debugger) is an end-to-end data de- bugger that abstracts key data-centric primitives to enable users to quickly identify and mitigate data-related problems in a given pipeline. Dagger was motivated by a series of interviews we conducted with data scientists across several organizations. A preliminary version of Dagger has been in- corporated into Data Civilizer 2.0 to help physicians at the Massachusetts General Hospital process complex pipelines.

Dagger: A Data (not code) Debugger / Kindi Rezig, El; Cao, Lei; Simonini, Giovanni; Schoemans, Maxime; Madden, Samuel; Tang, Nan; Ouzzani, Mourad; Stonebraker:, Michael. - (2020). (Intervento presentato al convegno 10th Annual Conference on Innovative Data Systems Research, CIDR 2020 tenutosi a Amsterdam, The Netherlands nel January 12-15, 2020).

Dagger: A Data (not code) Debugger

Giovanni Simonini;
2020

Abstract

With the democratization of data science libraries and frame- works, most data scientists manage and generate their data analytics pipelines using a collection of scripts (e.g., Python, R). This marks a shift from traditional applications that communicate back and forth with a DBMS that stores and manages the application data. While code debuggers have reached impressive maturity over the past decades, they fall short in assisting users to explore data-driven what-if sce- narios (e.g., split the training set into two and build two ML models). Those scenarios, while doable programmati- cally, are a substantial burden for users to manage them- selves. Dagger (Data Debugger) is an end-to-end data de- bugger that abstracts key data-centric primitives to enable users to quickly identify and mitigate data-related problems in a given pipeline. Dagger was motivated by a series of interviews we conducted with data scientists across several organizations. A preliminary version of Dagger has been in- corporated into Data Civilizer 2.0 to help physicians at the Massachusetts General Hospital process complex pipelines.
2020
10th Annual Conference on Innovative Data Systems Research, CIDR 2020
Amsterdam, The Netherlands
January 12-15, 2020
Kindi Rezig, El; Cao, Lei; Simonini, Giovanni; Schoemans, Maxime; Madden, Samuel; Tang, Nan; Ouzzani, Mourad; Stonebraker:, Michael
Dagger: A Data (not code) Debugger / Kindi Rezig, El; Cao, Lei; Simonini, Giovanni; Schoemans, Maxime; Madden, Samuel; Tang, Nan; Ouzzani, Mourad; Stonebraker:, Michael. - (2020). (Intervento presentato al convegno 10th Annual Conference on Innovative Data Systems Research, CIDR 2020 tenutosi a Amsterdam, The Netherlands nel January 12-15, 2020).
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1191055
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? ND
social impact