Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Introduction to Data Mining: Techniques and Applications using Pandas, Cheat Sheet of Data Mining

Islamic University of Technology Data Mining

This document offers a concise introduction to data mining, covering key concepts like the kdd process and essential data preprocessing techniques using the pandas library in python. it details data cleaning, integration, reduction, transformation, mining, pattern evaluation, and knowledge representation. the document also explores missing data handling techniques, including ignoring tuples, manual filling, using global constants, central tendency methods, and data mining algorithms. furthermore, it introduces data warehousing and metadata concepts, along with data visualization tools and libraries.

Typology: Cheat Sheet

2022/2023

Available from 05/06/2025

fattah-mahmud-nihal 🇧🇩

2 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

Intro to Data Mining

Data mining is the process of searching and analyzing a large batch of raw data in

order to identify patterns and extract useful information.

KDD Process (Knowledge Discovery from Data)

1. Data Cleaning

2. Data Integrating

3. Data Reduction

4. Data Transformation

5. Data Mining

6. Pattern Evaluation

7. Knowledge Representation

Hands in to Pandas (a python library used for data preprocessing)

#import pandas in Jupyter Notebook

import pandas as pd

# read data from .csv file

df= pd.read_csv("xyz.csv")

# check row-column amount from dataframe

df.shape

# check first row/first 3 rows of the dataframe

df.head()

df.head(3)

# check last row from dataframe

df.tail()

Discover Cheat Sheet of Data Mining Islamic University of Technology

Partial preview of the text

Download Introduction to Data Mining: Techniques and Applications using Pandas and more Cheat Sheet Data Mining in PDF only on Docsity!

Intro to Data Mining

Data mining is the process of searching and analyzing a large batch of raw data in order to identify patterns and extract useful information.

KDD Process (Knowledge Discovery from Data)

Data Cleaning
Data Integrating
Data Reduction
Data Transformation
Data Mining
Pattern Evaluation
Knowledge Representation

Hands in to Pandas (a python library used for data preprocessing)

#import pandas in Jupyter Notebook import pandas as pd

read data from .csv file

df= pd.read_csv("xyz.csv") df

check row-column amount from dataframe

df.shape

check first row/first 3 rows of the dataframe

df.head() df.head( 3 )

check last row from dataframe

df.tail()

check all the values of one specific column

df["humidity"]

Missing data handling techniques: ● Ignore the tuple ● Fill missing values manually ● Use global constant for missing values (NaN) ● Central tendency (Mean, Median, Mode) ● Data mining algorithm (most probable value)

check whether the dataframe has null values or not

df.isnull()

check total null values of columns

df.isnull.sum()

drop rows with null values

df.dropna()

fill null attributes with 0

df.fillna( 0 )

see mean values of of a column

df["visibilityMiles"].mean()

fill specific column values with specific number

df.fillna({"VisibilityMiles": 9 })

forward filling the missing values

df.fillna(method="ffill")

Further read: pandas official doc for file preprocessing

Introduction to Data Mining: Techniques and Applications using Pandas, Cheat Sheet of Data Mining

Related documents

Partial preview of the text

Download Introduction to Data Mining: Techniques and Applications using Pandas and more Cheat Sheet Data Mining in PDF only on Docsity!

Intro to Data Mining

KDD Process (Knowledge Discovery from Data)

Hands in to Pandas (a python library used for data preprocessing)

read data from .csv file

check row-column amount from dataframe

check first row/first 3 rows of the dataframe

check last row from dataframe

check all the values of one specific column

check whether the dataframe has null values or not

check total null values of columns

drop rows with null values

fill null attributes with 0

see mean values of of a column

fill specific column values with specific number

forward filling the missing values