Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

For each uploaded document

Answer questions

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Machine Learning a Quick Introduction - Lecture Slides | CSE 591, Study Guides, Projects, Research of Computer Science

Arizona State University (ASU) - Tempe Computer Science

Prof. Joerg Hakenberg

Material Type: Project; Professor: Hakenberg; Class: Introduction to Image Processing and Analysis; Subject: Computer Science and Engineering; University: Arizona State University - Tempe; Term: Fall 2008;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 09/02/2009

koofers-user-3vt 🇺🇸

10 documents

1 / 18

This page cannot be seen from the preview

Don't miss anything!

bg1

CSE 591

Machine learning

-a quick introduction-

Fall 2008

http://www.public.asu.edu/~jhakenbe/591/

pf3

pf4

pf5

pf8

pf9

pfa

pfd

pfe

pff

pf12

Discover Study Guides, Projects, Research of Computer Science Arizona State University (ASU) - Tempe

Related documents

Supervised Machine in Image Processing and Analysis - Lecture Slides | CSE 591

machine learning important topics

Binomial Distribution - Finite Mathematics - Quick Notes | MAT 119

Two Person Game - Quick Notes - Finite Mathematics | MAT 119

Homework-1 CSE575 Statistical Machine Learning 100% correct

Midterm Exam 1 Review Questions -Image Processing and Analysis | CSE 591

Sample Midterm Exam - Introduction to Image Processing and Analysis | CSE 591

Homework 2 for Introduction to Image Processing and Analysis | CSE 591

Randomized and Approximation Algorithms - Slides | CSE 591

Machine Learning and Colliders

Shallow Parsing, Full Sentence Parsing - Study Guide | CSE 591

Pattern-Based Relation Mining - Lecture Slides | CSE 591

Partial preview of the text

Download Machine Learning a Quick Introduction - Lecture Slides | CSE 591 and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

CSE 591

Machine learning

-a quick introduction-

Fall 2008

http://www.public.asu.edu/~jhakenbe/591/

Class format

• class project: 3 building blocks

- named entity recognition

• protein, drug, disease, organ/tissue, biol. process, cell. location

- sentence classification

• discusses certain type of relation?

- relation mining

• find partners in relation

• 4 groups (2 students)

- genetic implications in disease

- gene-drug associations

- cellular locations of proteins

- protein-protein interactions

• 15min presentations in per group

- dictionary-based NER

- naive Bayes for sentence

classification

- four types of relation mining:

pairwise classification, pattern-

based (POS), pattern-based

(parse tree), tree kernel

POS ambiguity

• Are there ambiguities other than NN/VB?

• JJ or NN?

• JJ or RB?

• NN or RB?

• IN or WDT?

NN, noun;^ VB, verb; JJ, adjective; RB, adverb; IN, preposition;^ WDT, which-determiner The Japanese_ JJ system for the classification of gastric cancer Five haplotypes were identified in the Japanese_ NNP population rapid_ JJ growth_ NN rapid_ RB growing_ VBG organisms 4 patients were returned back home_ RB today. These deaths took place at home_ NN. The fact that_ IN Marimastat reduced in vitro invasion ... Propolis (PP) is a sticky substance that_ WDT is collected from plants by honeybees. bank, store, home, call, … might be nouns or verbs

Machine learning

• make computers “learn”

• learn rules^ that^ explain^ a given data set

• model^ (a small snippet of) the world

• makes sense when we have^ massive data sets^ to

analyze, especially for repeated tasks:

- predict the weather (rain or not, humidity, temperature)

- stock market analysis

- credit card fraud detection

- time series analysis

- handwriting recognition

- filter emails (ham or spam)

- sort a text into a category (sport, politics)

Supervised learning

• starts with known data

- given a set of observations

- we know the outcome for each

- training^ (=learning from labeled examples) on these data points

• predicts the outcome of unknown data

- will it rain at 70 degrees and 30% humidity?

• common form:^ classification

yes no

We call an observation (or list

of observations) with an

outcome a “labeled example”

Temperature Humidity Rain

55 10% no

40 40% yes

95 20% no

75 45% yes

A set of labeled

examples is a

“training set”

Unsupervised learning

• starts with observations

• but we don’t know their labels

• typically used to find a structure in a data set

- group texts by similarity^ ➠^ groups of texts that share a similar

“topic” ➠ but we don’t know what that “topic” might be

• common form:^ clustering

We call an observation (or list of

observations) without an

outcome an “unlabeled example”

Vector space representation

supervised unsupervised

= result

Immediate applications

• supervised learning

- classify^ new data (blue) using the learned rules

- e.g., by checking on which side of the hyperplane they are

- hypothesis: new data will correspond to the examples on the

same side ➠ same label

• unsupervised learning

- similar data are already^ clustered^ together

- we could check a few examples per cluster and label them

- then assign the label to all examples in the same cluster

Support vector machine

(overall idea only)

• supervised ML

• learns a separation of data points^ ➠^ high-dimensional vector

space (one dimension per feature) ➱ learns a hyperplane

• iteratively adapts a hyperplane until all or most training

examples lie on the correct side

• hyperplane is represented by^ support vectors^ ➱^ SVM

• classification: build the^ norm^ of a new vector onto the

hyperplane ➠ check sign ➠ predict class

• details in three weeks

k-means clustering

• unsupervised ML

• decide on number of clusters,^ k

• decide on similarity measure (e.g., cosine coefficient)

1. randomized initialization with k centroids

2. assign remaining points by similarity to these centroids

3. compute actual centroid per cluster

4. re-assign remaining points

5. until no new centroids

Summary

• ML helps in explaining a (small, virtual) world

• world consists of^ observations, e.g., features and their values

a weather observation (temperature, humidity, overcast)
a text (tokens and their TF*IDF score)

• sometimes we can make use of previously known labels^ ➠^ supervised^ vs.

unsupervised learning

given a particular weather situation, was it raining or not?
a text is on sports, or business, or politics, (or a combination)

• vector space model used in most techniques (at least implicitely)

• ML learns rules that explain a data set^ ➠^ a model

- hyperplances, decisions, cluster boundaries, centroids

• we can apply the model to new data

- classification: find the label of a new example given some “old” labeled

examples

- clustering: group examples by their similarity

What we’ll do next time

• Machine learning

- sequence learning: Hidden Markov Models, Conditional

Random Fields

• Evaluation

- predictions: true positive, false positive, false negative, …

- metrics: precision, recall, f-measure, accuracy

• Named entity recognition

- dictionary-based

- CRF-based