

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document offers a concise introduction to data mining, covering key concepts like the kdd process and essential data preprocessing techniques using the pandas library in python. it details data cleaning, integration, reduction, transformation, mining, pattern evaluation, and knowledge representation. the document also explores missing data handling techniques, including ignoring tuples, manual filling, using global constants, central tendency methods, and data mining algorithms. furthermore, it introduces data warehousing and metadata concepts, along with data visualization tools and libraries.
Typology: Cheat Sheet
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Data mining is the process of searching and analyzing a large batch of raw data in order to identify patterns and extract useful information.
#import pandas in Jupyter Notebook import pandas as pd
df= pd.read_csv("xyz.csv") df
df.shape
df.head() df.head( 3 )
df.tail()
df["humidity"]
Missing data handling techniques: โ Ignore the tuple โ Fill missing values manually โ Use global constant for missing values (NaN) โ Central tendency (Mean, Median, Mode) โ Data mining algorithm (most probable value)
df.isnull()
df.isnull.sum()
df.dropna()
df.fillna( 0 )
df["visibilityMiles"].mean()
df.fillna({"VisibilityMiles": 9 })
df.fillna(method="ffill")
Further read: pandas official doc for file preprocessing