Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Introduction to Data Mining-Data Mining-Lecture 01 Slides-Computer Science, Slides of Data Mining

Stanford University Data Mining

In this subject you will be able to learn High-correlation Mining, Min-hashing, Locality-sensitive Hashing, Mining Data Streams and Clustering for Large-scale Data.Data Mining, Anand Rajaraman, Jeff Ullman, Data Cleansing, Visualization, Warehousing, Decision Trees, Clusters, Bayes, Hidden-markov, Applications, Cultures, Models vs. Analytic Processing, Rhine Paradox

Typology: Slides

2011/2012

Uploaded on 01/31/2012

marphy 🇺🇸

4.4

(31)

284 documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

CS345 --- Data Mining

Introductions

What Is It?

Cultures of Data Mining

Discover Slides of Data Mining Stanford University

Partial preview of the text

Download Introduction to Data Mining-Data Mining-Lecture 01 Slides-Computer Science and more Slides Data Mining in PDF only on Docsity!

CS345 --- Data Mining

Introductions

What Is It?

Cultures of Data Mining

Course Staff

Instructors:

Anand Rajaraman

Jeff Ullman

TA:

Robbie Yan

Project

Software implementation

related to

course subject matter.

Should involve an

original

component

or experiment.

We will provide some databases to mine; others are OK.

Team Projects

Working in pairs OK, but …

We will expect more from a pair thanfrom an individual.

The effort should be roughly evenlydistributed.

Typical Kinds of Patterns

Decision trees

: succinct ways to classify by

testing properties.

Clusters

: another succinct classification by

similarity of properties.

Bayes, hidden-Markov

, and other statistical

models,

frequent-itemsets

: expose

important associations within data.

Example: Clusters

x x

x x x

x x

x x x

x x

Applications (Among Many)

Intelligence-gathering

Total Information Awareness.

Web Analysis

PageRank.

Marketing

Run a sale on diapers; raise the price ofbeer.

Cultures

Databases

: concentrate on large-scale

(non-main-memory) data.

AI

(machine-learning): concentrate on

complex methods, small data.

Statistics

: concentrate on inferring

models.

(Way too Simple) Example

Given a billion numbers, a DB person might compute their average.

A statistician might fit the billion points to the best Gaussian distribution andreport the mean and standarddeviation.

Meaningfulness of Answers

A big risk when data mining is that you will “discover” patterns that aremeaningless.

Statisticians call it

Bonferroni’s

principle

: (roughly) if you look in more

places for interesting patterns than youramount of data will support, you arebound to find crap.

Rhine Paradox --- (1)

David Rhine was a parapsychologist in the 1950’s who hypothesized that somepeople had Extra-Sensory Perception.

He devised an experiment where subjects were asked to guess 10 hidden cards --- red

blue

He discovered that almost 1 in 1000 had ESP --- they were able to get all 10 right!

Rhine Paradox --- (2)

He told these people they had ESP and called them in for another test of thesame type.

Alas, he discovered that almost all of them had lost their ESP.

What did he conclude?

Answer on next slide.

A Concrete Example

This example illustrates a problem with intelligence-gathering.

Suppose we believe that certain groups of evil-doers are meeting occasionally inhotels to plot doing evil.

We want to find people who at least twice have stayed at the same hotel on the sameday.

The Details

people being tracked.

1000 days.

Each person stays in a hotel 1% of the time (10 days out of 1000).

Hotels hold 100 people (so 10

hotels).

If everyone behaves randomly (I.e., no evil-doers) will the data mining detectanything suspicious?

Introduction to Data Mining-Data Mining-Lecture 01 Slides-Computer Science, Slides of Data Mining

Related documents

Partial preview of the text

Download Introduction to Data Mining-Data Mining-Lecture 01 Slides-Computer Science and more Slides Data Mining in PDF only on Docsity!

CS345 --- Data Mining

Course Staff

TA:

Project

Team Projects

Typical Kinds of Patterns

Example: Clusters

Applications (Among Many)

Cultures

AI

(Way too Simple) Example

Meaningfulness of Answers

Rhine Paradox --- (1)

Rhine Paradox --- (2)

A Concrete Example

The Details