A Representation Based on Essence for the CRISP-DM Methodology

Claudia Elena Durango Vanegas, Juan Camilo Giraldo Mejía, Fabio Alberto Vargas Agudelo, Dario Enrique Soto Duran


CRoss Industry Standard Process for Data Mining (CRISP-DM) is a data mining project development methodology that establishes tasks and levels of abstraction, hierarchically structured to facilitate its implementation through a set of actions that help in making decisions. Essence is a theory that helps identify best practices and essential, common, and universal elements to all endeavor in the software development cycle. In the literature, there are different models of representation of the CRISP-DM methodology, such as verbal model, conceptual model, process understanding model, and ontology. However, it considered that these representation models lack the incorporation of some elements, such as, activities, work products, and roles of the CRISP-DM methodology. In this paper we propose a representation based on Essence of the CRISP-DM methodology, incorporating the essential elements that we believe are missing from existing representations. With the representation in Essence that is proposed, the aim is to improve the understanding of best practices and the essential, common, and universal elements of the CRISP-DM methodology for future implementations in data mining projects. In addition, it seeks to validate that Essence can be used in different of data mining projects.


CRISP-DM methodology, data mining, representation model, essence

