Machine Learning Exercises for High School Students, Study notes of Computer science

Exercises and explanations of machine learning concepts, such as recommendation systems, clustering, and features, using movie critics as an example. The exercises are aimed at high school students and include sorting coins without machine learning and the K-means algorithm. The document also briefly explains the Netflix prize. a set of study notes and lecture notes.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

percyval
percyval 🇺🇸

4

(13)

227 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
+
Machine Learning Exercises for High School
Students
Joshua B. Gordon
July 7th, 2011
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Machine Learning Exercises for High School Students and more Study notes Computer science in PDF only on Docsity!

Machine Learning Exercises for High School

Students

Joshua B. Gordon

July 7

th

+ Outline

n Recommendation systems

n Intuition for algorithms that find patterns in data

n Clustering using Euclidian distance

n Classroom exercises

+ Amazon

n Amazon doesn't know what it's like to read a book, or what

you feel like when you read a particular book

n Amazon does know that people who bought a certain book

also bought other books

n Patterns in the data can used to make recommendations

n If you’ve built up a long purchase history you'll often see

pretty sophisticated recommendations

+ Netflix prize

n Netflix is an online DVD rental company that recommends

movies to subscribers

n 2006: Netflix announce $1 million to the first person who can

improve the accuracy of its recommendation algorithm by

n How can an algorithm recommend movies?

n By leveraging patterns in data (and lots of it)

+ Critics with similar tastes

n Preference space

Star Wars 5 Sam 4 Sandy 3 2 Matt 1 Julia 1 2 3 4 5 Raiders of the Lost Arc

+ Measuring distance

n Measure similarity with Euclidian distance

Star Wars 5 Sam 4 Sandy 3 2 Matt 1 Julia 1 2 3 4 5 Raiders

+ Making a recommendation

n Sarah hasn’t seen Raiders, but gave Star Wars five stars

n It’s a good bet she’ll like Raiders too

Star Wars 5 Sarah Sam 4 Sandy 3 2 Matt 1 Julia 1 2 3 4 5 Raiders

+ Features

n We used features to compare critics

n Feature: a data attribute used to make a comparison

n Quantify attributes of an object (size, weight, color, shape,

density) in a way a computer can understand

n Quality is important

+ Features to compare movies Feature Star Wars Raiders of the Lost Arc Casablanca Singin' in the Rain … … … … …

+ Features to compare movies Feature Star Wars Raiders of the Lost Arc Casablanca Singin' in the Rain Action (1 to 5) 5 4 2 1 Romance (1 to 5) 1 2 4 3 Length (min) 121 115 102 103 Harrison Ford Y Y N N Year 1977 1981 1942 1952

+ Clustering

n Cluster: group of related objects

n We did OK at eyeballing clusters, but what if we had lots of

data? Or wanted to use more than two dimensions?

n Today we’ll learn a Machine learning method called K-

means that finds clusters automatically

n Machine learning is a field of computer

science that studies algorithms

that learn from patterns in data.

n 3 class exercises

+ Sorting coins without machine learning

n Suppose you wish to separate quarters, nickels, dimes

n What information would the computer need to distinguish

between these three types of coins?

n Think about how you would do the task

yourself

Exercises from Steve Essinger and Gail Rosen’s excellent article: “An Introduction to Machine Learning for Students in Secondary Education”

+ Problem solving

n Problem statement: automatically sort a

large bag of ancient coins.

n The K-means algorithm will be used to find

clusters

n First it needs features to compare the

similarity of data points

n If poor features are chosen, the algorithm

will be unable to solve the task.

Problem statement Extract features Implement Evaluate

+ The K-means algorithm n To start, K-means needs to know k, the number of types of coins in advance.

  1. Choose k starting points randomly. These are called centroids.