Introduction to Data Mining: Techniques and Applications using Pandas, Cheat Sheet of Data Mining

This document offers a concise introduction to data mining, covering key concepts like the kdd process and essential data preprocessing techniques using the pandas library in python. it details data cleaning, integration, reduction, transformation, mining, pattern evaluation, and knowledge representation. the document also explores missing data handling techniques, including ignoring tuples, manual filling, using global constants, central tendency methods, and data mining algorithms. furthermore, it introduces data warehousing and metadata concepts, along with data visualization tools and libraries.

Typology: Cheat Sheet

2022/2023

Available from 05/06/2025

fattah-mahmud-nihal
fattah-mahmud-nihal ๐Ÿ‡ง๐Ÿ‡ฉ

2 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Intro to Data Mining
Data mining is the process of searching and analyzing a large batch of raw data in
order to identify patterns and extract useful information.
KDD Process (Knowledge Discovery from Data)
1. Data Cleaning
2. Data Integrating
3. Data Reduction
4. Data Transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge Representation
Hands in to Pandas (a python library used for data preprocessing)
#import pandas in Jupyter Notebook
import pandas as pd
# read data from .csv file
df= pd.read_csv("xyz.csv")
df
# check row-column amount from dataframe
df.shape
# check first row/first 3 rows of the dataframe
df.head()
df.head(3)
# check last row from dataframe
df.tail()
pf3

Partial preview of the text

Download Introduction to Data Mining: Techniques and Applications using Pandas and more Cheat Sheet Data Mining in PDF only on Docsity!

Intro to Data Mining

Data mining is the process of searching and analyzing a large batch of raw data in order to identify patterns and extract useful information.

KDD Process (Knowledge Discovery from Data)

  1. Data Cleaning
  2. Data Integrating
  3. Data Reduction
  4. Data Transformation
  5. Data Mining
  6. Pattern Evaluation
  7. Knowledge Representation

Hands in to Pandas (a python library used for data preprocessing)

#import pandas in Jupyter Notebook import pandas as pd

read data from .csv file

df= pd.read_csv("xyz.csv") df

check row-column amount from dataframe

df.shape

check first row/first 3 rows of the dataframe

df.head() df.head( 3 )

check last row from dataframe

df.tail()

check all the values of one specific column

df["humidity"]

Missing data handling techniques: โ— Ignore the tuple โ— Fill missing values manually โ— Use global constant for missing values (NaN) โ— Central tendency (Mean, Median, Mode) โ— Data mining algorithm (most probable value)

check whether the dataframe has null values or not

df.isnull()

check total null values of columns

df.isnull.sum()

drop rows with null values

df.dropna()

fill null attributes with 0

df.fillna( 0 )

see mean values of of a column

df["visibilityMiles"].mean()

fill specific column values with specific number

df.fillna({"VisibilityMiles": 9 })

forward filling the missing values

df.fillna(method="ffill")

Further read: pandas official doc for file preprocessing