Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Data Scientist Introduccion, Resúmenes de Medicina

Introducción a la ciencia de datos

Tipo: Resúmenes

2021/2022

Subido el 25/04/2022

bernardo-palacio-n
bernardo-palacio-n 🇨🇴

1 documento

1 / 3

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
Data Science Methodology
Methodology: A system of methods used in a particular area of study or activity.
By John Rollins based on CRISP-DM
In tr od ucti on to Cross In du st ry P roc es s fo r Data Mi ni ng ( CR IS P-DM)
- Aimed at increasing the use of data mining over a wide variety of business
applications and industries. The intent is to take case specific scenarios and
general behaviors to make them domain neutral.
pf3

Vista previa parcial del texto

¡Descarga Data Scientist Introduccion y más Resúmenes en PDF de Medicina solo en Docsity!

Data Science Methodology

Methodology: A system of methods used in a particular area of study or activity. By John Rollins based on CRISP-DM Introducti on to Cross Industry Process for Data Mining (CRISP-DM)

  • Aimed at increasing the use of data mining over a wide variety of business applications and industries. The intent is to take case specific scenarios and general behaviors to make them domain neutral.
  1. Business Understanding The most important because this is where the intention of the project is outlined. Foundational Methodology and CRISP-DM are aligned here. It requires communication and clarity. The difficulty here is that stakeholders have different objectives, biases, and modalities of relating information. They don’t all see the same things or in the same manner. Without clear, concise, and complete perspective of what the project goals are resources will be needlessly expended.
  2. Data Understanding Data understanding relies on business understanding. Data is collected at this stage of the process. The understanding of what the business wants and needs will determine what data is collected, from what sources, and by what methods. CRISP-DM combines the stages of Data Requirements, Data Collection, and Data Understanding from the Foundational Methodology outline.
  3. Data Preparation Once the data has been collected, it must be transformed into a useable subset unless it is determined that more data is needed. Once a dataset is chosen, it must then be checked for questionable, missing, or ambiguous cases. Data Preparation is common to CRISP-DM and Foundational Methodology.
  4. Modeling Once prepared for use, the data must be expressed through whatever appropriate models, give meaningful insights, and hopefully new knowledge. This is the purpose of data mining: to create knowledge information that has meaning and utility. The use of models reveals patterns and structures within the data that provide insight into the features of interest. Models are selected on a portion of the data and adjustments are made if necessary. Model selection is an art and science. Both Foundational Methodology and CRISP-DM are required for the subsequent stage.
  5. Evaluation The selected model must be tested. This is usually done by having a pre-selected test, set to run the trained model on. This will allow you to see the effectiveness of the model on a set it sees as new. Results from this are used to determine efficacy of the model and foreshadows its role in the next and final stage.
  6. Deployment In the deployment step, the model is used on new data outside of the scope of the dataset and by new stakeholders. The new interactions at this phase might reveal the new variables and needs for the dataset and model. These new challenges could initiate revision of either business needs and actions, or the model and data, or both.