Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Unsupervised Learning: Agglomerative Clustering & Microarray Analysis, Assignments of Programming Languages

University of New Mexico (UNM) - Gallup Programming Languages

An overview of unsupervised learning, focusing on clustering and model fitting. Topics include the goals of presentations, unsupervised problems, typical tasks, and bioinformatics applications using microarrays. Agglomerative clustering, distance measures between clusters, and dendrograms.

Typology: Assignments

Pre 2010

Uploaded on 07/22/2009

koofers-user-kxl 🇺🇸

4.5

(2)

10 documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

Unsupervised Learning:

Clustering & Model Fitting

Discover Assignments of Programming Languages University of New Mexico (UNM) - Gallup

Partial preview of the text

Download Unsupervised Learning: Agglomerative Clustering & Microarray Analysis and more Assignments Programming Languages in PDF only on Docsity!

Unsupervised Learning:

Clustering & Model Fitting

Administrivia

Reminder: office hours truncated tomorrow

“whenever I get in” until noon

HW3 due: Dec 2

Have an excellent Turkey Day!

The art of presentations

Do NOT tell us:

Every detail of every experiment

Choose the parts to show us carefully

Each thing you show us should be informative

about your conclusions

Place for excruciating detail is the paper

Every step of all the math

Every background reference

Focus on the “big picture” and the “take home

message”

Listeners will take home ~10 bytes. Make sure

they’re the right 10 bytes!

The unsupervised problem

Given:

Set of data points

Find:

Good description of the data

General unsup tasks

Given, data matrix

Which points are similar?

How do points cluster together?

How many groups are there?

Statistical description of distribution of data?

X =

x

· · · x

1 N

x

2 N

x

d 1

... x

5 minutes of bioinformatics

Gene microarray (a.k.a., genechip, DNA chip, etc.)

Measure thousands (10s or 100s of thousands) of

genes simultaneously

Critical tool in bioinformatics

Understand function of genes, networks of gene

activity, response to stimuli, etc.

Leads to some very nasty analysis problems...

Only mRNA can be (easily measured)

When gene is “activated”, mRNA is produced

Can be “upregulated” or “downregulated” to

produce diff. concentrations of mRNA

Can be active or inactive under different

conditions:

External stimuli (food, ph, temperature, viral

infection, etc.)

Internal metabolic processes (cell cycles,

pathways, etc.)

mRNA measurements correlated with cell activity

5 minutes of bioinformatics

measuring many mRNA...

Population A of cells

Population B of cells

mRNA pool A

mRNA pool B

Irradiation

5 minutes of bioinformatics

Imaging

[ x

, x

, ..., x

]

Data vector

5 minutes of bioinformatics

Measure populations over

time

Monitor development of

cell, metabolic processes,

response to introduction

of stimulus, etc.

Time series of data

     

x 11

x 12

· · · x 1 N

x 21

. x 2 N

x d 1

... x dN

     

timepoints

genes

Can consider either rows or

columns to be “points”, depending

on what you want to know

Similarity & distance

Most clust. algorithms based on distances between

points

Recall: distance (metric) function d ( x

, x

Symmetry: d ( x

, x

)= d ( x

, x

Identity: d ( x

, x

Triangle inequality: d ( x

, x

)<= d ( x

, x

)+ d ( x

, x

E.g., Euclidean distance, kernel distance, etc.

Sometimes have a natural similarity function instead

Can usually convert to a metric or semi-metric

Agglomerative clustering

Group clusters by mutual distance

“Bottom-up” method: start w/ points and combine

into groups, combine groups, etc.

Dist between clusters?

Problem: We have distance between pairs of points

Agglomerative clustering requires distance between

pairs of clusters

A number of measures are possible:

d

min

(c

, c

) = min

x∈c 1

′ ∈c 2

d(x, x

′

Dist between clusters?

Problem: We have distance between pairs of points

Agglomerative clustering requires distance between

pairs of clusters

A number of measures are possible:

d

max

(c

, c

) = max

x∈c 1

′ ∈c 2

d(x, x

′

Unsupervised Learning: Agglomerative Clustering & Microarray Analysis, Assignments of Programming Languages

Related documents

Partial preview of the text

Download Unsupervised Learning: Agglomerative Clustering & Microarray Analysis and more Assignments Programming Languages in PDF only on Docsity!

Unsupervised Learning:

Clustering & Model Fitting

Administrivia

The art of presentations

The unsupervised problem

General unsup tasks

X =

x

x

· · · x

x

x

x

... x

5 minutes of bioinformatics

5 minutes of bioinformatics

5 minutes of bioinformatics

5 minutes of bioinformatics

]

5 minutes of bioinformatics

timepoints

genes

Similarity & distance

Agglomerative clustering

Dist between clusters?

d

(c

, c

) = min

d(x, x

Dist between clusters?

d

(c

, c

) = max

d(x, x