Download Data Mining - Embedded Intelligent Robotics - Lecture Slides and more Slides Robotics in PDF only on Docsity!
Machine Learning andMachine Learning and
Data MiningData Mining
Fundamentals,Fundamentals,
robotics,robotics,
recognitionrecognition
Machine Learning,Machine Learning,
Data Mining,Data Mining,
Knowledge Discovery inKnowledge Discovery in
Data BasesData Bases
Their mutual relationsTheir mutual relations
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
(KDD)
- What is a “– Identifying non-trivial, valid, useful patterns in data.
patternpattern
- For example, credit card fraud:• Says something about the probability distribution of variables
- Variables may be: amount spent in one day, items purchased, where items where purchased, etc.
- Prob(fraud | lives in Newark and 5 color TV’s bought in NYC in 2 hours) = HIGH!
- What is data (?what are data?):
- For the purposes of data mining,
(^) database
- like in constructive induction, decision trees, rough sets, etc.– columns describe attributes of that instance. – rows represent a record or instance
- Knowledge discovery usually involves “domain expert” and data mining analyst.
- Other steps include gathering, preparing, storing, and accessing data. Also, ultimately understanding the patterns discovered.
- Steps in
Knowledge discovery^ Knowledge discovery
: (from George Johns)
- Understand and define problem
- So that you don’t mindless apply inappropriate algorithms and get worthless (or misleading!) results
- Extract data
- Also, you need to decide what data you need, this is where– Data usually already exists in database, but not in the form you need!
(^) expert
comes in handy
- • Understand and clean dataUnderstand and clean data
- Make sure data is not too consistent. (Is what you are trying to predict – Make sure you have consistent data one of the variables you know?)
- Machine Learning:
- But machine learning means more:– This is the algorithm part of the data mining process
- “Computer program that improves its performance at some task through experience.” - Tom Mitchell (1997 -
(^) Machine Learning
)
- “A learning system uses sample data to generate an updated
- Donald Michie (1991 -symbolic form.”same source and expresses the new basis in intelligiblebasis for improved [performance] on subsequent data from the
(^) Computer Journal
)
- “Learning denotes changes in the system that are adaptive in the sense that population more effectively the next time.”they enable the system to do the same task or tasks drawn from the same
- Herbert Simon (1983 -
(^) Machine Learning I
)
(http://www.aic.nrl.navy.mil/~aha/Last two are from David W. Aha’s tutorial on Machine Learning
ocsity.co
What is learning according to
Webster?
- To gain knowledge or understanding experienceof or skill in by study, instruction, or
- To come to be able, to come to realize
- Robot learning - getting a robot to
do all this good stuff
(^) How does this relate to data mining?
algorithms^ What we will discuss is really machine learning
(^) Learning task is data driven.
(win, lose, draw)state of gamepositions are generated. Data is position of pieces and finalFor example in checkers learning problem, many board
hypotheses to find one that best fits observed data.^ Learning reduces to searching a large space of
Find patterns or relationships that can be generalized
Really the same problem as data mining.
search is done.^ Machine learning algorithms deal with how this
- Scientific method:years?How does this differ from what people have been doing for
- Hypothesis -> Design Experiment -> Gather Data -> Test hypothesis
- Data mining is more “data centric”
- Data may have been gathered as part of an overall process or may be just an “accidental” by-product.
- Examples:
- Visual and chemical information collected by a robot in field• Information on patients gathered by hospitals• Historical chemical process control data• Credit card information• Data from discount cards at supermarket
- Let’s get more specific -
- – Some common examples of data mining:
Credit worthiness:^ Credit worthiness:
(^) Should a bank make a loan to
someone?
- Data consists of financial
(^) attributes
Each attribute a set of values (discrete or continuous)– Residence– Length of time in job– Credit history– Debt – Income
(^) instances
(^) (cases)
- These may be used for training (supervised learning) (Sometimes you look for patterns, no training
- unsupervised learning)
- Ultimate goal is to predict the likelihood of default based on a new customer’s attributes.
Marketing:
productUse past buying habits to predict likelihood of customer purchasing some new
Health care
Look for causal relationships between environment and disease
Investment
Predict the stock market?
Textual datamining
internet
Bioinformatics
image processing
Astronomy
Chemistry
Speech recognition
Machine learning methods applied to signal and image analysis
- – RoboticsRobotics
- • speech, sensor arrays, images, sensor integration,prediction, recognitionspeech, sensor arrays, images, sensor integration,prediction, recognition
- Methods:
- Genetic algorithms– Constructive Induction (last lecture)– Baysian networks– Bayesian classifiers– Neural networks– Decision Trees– Regression (linear, non-linear, multiple variables)
- Many resources available on Internet
- A few to start with
- http://www.cs.cmu.edu/~tom/• http://www.ai.univie.ac.at/oefai/ml/ml-resources.html• http://www.kdnuggets.com/
- Assignment:
- Download “Enhancements to the Data Mining Process” George Johns’ thesis (now a book as well)
- Read Chapter 1 “What is Data Mining?
BasesLearning, Data Mining and Knowledge Discovery in DataWhat is your understanding of relations between Machine
that you learned in this class.propose the complete data mining method based on methodsThink about interesting data base that can be mined and
How data mining can be used in Intelligent Robotics?
methodsPropose a robot that will use Knowledge Discovery
Propose an Internet Robot with Data Mining abilities
Questions and ProblemsQuestions and Problems