



















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to instance-based learning, focusing on the k-nearest neighbors (k-nn) algorithm and k-means clustering. Instance-based learning is an effective approach for dealing with large numeric data sets. K-nn can be used in both supervised and unsupervised settings, while k-means is commonly used for unsupervised clustering. The basics of these methods, including how they work, their advantages, and challenges.
Typology: Exams
1 / 27
This page cannot be seen from the preview
Don't miss anything!




















Chris Brooks Department of Computer ScienceUniversity of San Francisco
Department of Computer Science — University of San Francisco – p.1/
k^ closest points and collect their classifications. Use majority rule to classify the unseen point.
Department of Computer Science — University of San Francisco – p.3/
Class 4 3
We see the following data point: x1=3, x2 = 1. Howshould we classify it?
Department of Computer Science — University of San Francisco – p.4/
Department of Computer Science — University of San Francisco – p.6/
k?
Search using cross-validation Distance is computed globally. Recall the data we used for decision tree training. Part of the goal was eliminate irrelevant attributes. All neighbors get an equal vote.
Department of Computer Science — University of San Francisco – p.7/
curse of dimensionality
Department of Computer Science — University of San Francisco – p.9/
√) = (^2) ∑^ (w[ i](p[i^1
]^ −^ p[^2
(^2) i]))where
w^ is a vector of
weights. This has the effect of transforming or stretching theinstance space. More useful features have larger weights
Department of Computer Science — University of San Francisco – p.10/
Department of Computer Science — University of San Francisco – p.12/
K
clusters.^ For the moment, assume
K^ given.
Approach 1: Choose K items at random. We will call these the centers
Each center gets its own cluster. For each other item, assign it to the cluster thatminimizes distance between it and the center. This is called
K-means clustering.
Department of Computer Science — University of San Francisco – p.13/
Department of Computer Science — University of San Francisco – p.15/
Department of Computer Science — University of San Francisco – p.16/
=^ random
items while^ not
done^ : foreach^ item^
: assign^
to^ closest
center
foreach
center
: find^ mean
of^ its
cluster.
Department of Computer Science — University of San Francisco – p.18/
hierarchical clustering
Department of Computer Science — University of San Francisco – p.19/