Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Mining & Machine Learning at Purdue University, Lecture notes of Data Structures and Algorithms

Purdue University Data Structures and Algorithms

An overview of the CS37300 course on Data Mining & Machine Learning at Purdue University. It includes information on the syllabus, textbooks, workload, exams, computing resources, and Python resources. The document also explains the machine learning process, elements of data mining & machine learning algorithms, and provides an example of survival bias. It is a useful resource for students interested in data mining, data science, and machine learning.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

arlie 🇺🇸

4.6

(18)

245 documents

1 / 55

This page cannot be seen from the preview

Don't miss anything!

Data Mining & Machine Learning

CS37300

Purdue University

August 23, 2022

Bruno Ribeiro

Discover Lecture notes of Data Structures and Algorithms Purdue University

Partial preview of the text

Download Data Mining & Machine Learning at Purdue University and more Lecture notes Data Structures and Algorithms in PDF only on Docsity!

Data Mining & Machine Learning

CS3 7300

Purdue University

August 2 3 , 20 22

Bruno Ribeiro

Course overview

Topics

Elements of data science algorithms
- Machine Learning
- Data Mining
- Statistics
Statistical basics and background
Data preparation and exploration
Predictive modeling
Methodology, evaluation
Descriptive modeling

Syllabus / Logistics

Syllabus and ALL necessary information (slides, notes, links) will be posted on

our website

https://www.cs.purdue.edu/homes/ribeirob/courses/Fall2022/

Workload

Homeworks (6 theory + programming assignments)
- Six assignments including written/math exercises, programming

assignments in python

Python is an important language to learn in data mining, data

science, and machine learning

Late policy: No Late Homework (Grade = zero after deadline)
- Submission on Gradescope
- Firm deadlines (6:00pm) with no late penalty until 6:00am next day
Lowest homework score will be dropped from the average
- Do not skip a homework early: Save for emergencies
Exams
Midterm and final exam

Grading

Grades will be posted on

Brightspace: https://purdue.brightspace.com/d2l/home/ 599255

Attendance: 5%
ML Competition (Kaggle Competition): up to +5% (extra credit)
Homework: 45% (the lowest grade homework will be dropped from

average)

Serious and documented medical or family emergencies will be

automatically counted as a zero grade (i.e., discarded from the average).

Additional extensions (beyond one missed homework) will be granted if

the documented emergency persists for 2+ homeworks.

Students are advised to not drop a homework for non-emergency

reasons since, if an emergency happens, the student will have two zero

grades and one of them will count towards the average.

Midterm: 20%
Final exam: 30%

Computing Resources

Scholar Cluster

Software needed and cluster usage manual

https://www.cs.purdue.edu/homes/ribeirob/courses/Fall2022/howto/cluster-how-to.html

Course introduction

Machine Learning

Machine learning : How can we build computer systems that

automatically improve with experience? (Mitchell 2006)

Databases

Artificial Intelligence

Visualization

Statistics

Processed

data

Target

data

Data

Selection

Preprocessing

Learning Patterns Interpretation

evaluation

Knowledge

The machine learning process

Machine Learning Process

Application setup:
- Acquire relevant domain

knowledge

Assess user goals

Data selection

Choose data sources
Identify relevant attributes
Sample data

Data preprocessing

Remove noise or outliers
Handle missing values
Account for time or other

changes

Data transformation
- Find useful features
- Reduce dimensionality

Machine Learning Process

Data representation: Describe the data
Task specification: Outline the goal(s)
Knowledge representation: Describe the rules
Learning technique:
- Search: Identify a rule
- Evaluation function: Estimate confidence
Prediction technique: Apply the rule
Data mining system: Do above in combination

Complexities

Data size: vastly larger or changing rapidly
Data representation: can affect ability to learn and interpret models
Knowledge representation: needs to capture more subtle forms of

probabilistic dependence

Search space: vastly larger
Evaluation functions: difficult to assess confidence in model utility

Data Mining & Machine Learning at Purdue University, Lecture notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Data Mining & Machine Learning at Purdue University and more Lecture notes Data Structures and Algorithms in PDF only on Docsity!

Data Mining & Machine Learning

Course overview

Topics

Syllabus / Logistics

Workload

Grading

Computing Resources

Software needed and cluster usage manual

Course introduction

Machine Learning

automatically improve with experience? (Mitchell 2006)

Databases

Artificial Intelligence

Visualization

Statistics

The machine learning process

Machine Learning Process

Machine Learning Process

Complexities