Download Introduction to Machine Learning and more Lecture notes Machine Learning in PDF only on Docsity! Machine Learning “Summer” School: Introduction Pradeep Ravikumar, Peter Stone Department of Computer Science The University of Texas at Austin What is Machine Learning? Artificial Intelligence and Computer Science • Artificial Intelligence as a field predates that of Computer Science itself
"an ancient wish to forge the gods" — Pamela McCorduck • Pioneering researchers in CS were also researchers in AI e.g. Alan Turing • Main Hypothesis: If we can build machines that can compute, we can build machines that can think! Artificial Intelligence and Computer Science • Artificial Intelligence as a field predates that of Computer Science itself
"an ancient wish to forge the gods" — Pamela McCorduck • Pioneering researchers in CS were also researchers in AI e.g. Alan Turing • Main Hypothesis: If we can build machines that can compute, we can build machines that can think! • We can now do the former — computing machines — but not the latter —human-level intelligent machines (for general tasks)*. Why? Artificial Intelligence and Computer Science • Artificial Intelligence as a field predates that of Computer Science itself
"an ancient wish to forge the gods" — Pamela McCorduck • Pioneering researchers in CS were also researchers in AI e.g. Alan Turing • Main Hypothesis: If we can build machines that can compute, we can build machines that can think! • We can now do the former — computing machines — but not the latter —human-level intelligent machines (for general tasks)*. Why? ‣ What to compute (for general human-level intelligence) is not clear, and even when it is, it is (computationally) intractable Machine Learning in Recent Years • Machine Learning in recent years: ‣Data-driven science — as a new fourth paradigm of scientific discovery
(first three science paradigms: experimental, theoretical, computational) ‣Data-driven engineering ‣Philosophy of Data
“If you asked me to describe the rising philosophy of the day, I’d say it is data-ism.... that data will help us do remarkable things — like foretell the future.”
.... David Brooks • Modern Machine Learning: Mathematical models learnt from data that characterize the relationships amongst variables in the system Machine Learning: Models • A key infrastructural element in machine learning is a mathematical model ‣ A mathematical characterization of system(s) of interest (typically via random variables, in which case it is called a statistical model) Graphical Models S1 S2 S3 S4 D1 D3 D2 Symptoms S1, S2 Disease D1? • With many symptoms and diseases,
deduction is difficult • Prob (D1 | S1, S2) Symptoms Diseases Machine Learning: Models • A key infrastructural element in machine learning is a mathematical model ‣ A mathematical characterization of system(s) of interest, typically via random variables Machine Learning: Models • A key infrastructural element in machine learning is a mathematical model ‣ A mathematical characterization of system(s) of interest, typically via random variables ‣ the model chosen typically depends on the task at hand Machine Learning: Models • A key infrastructural element in machine learning is a mathematical model ‣ A mathematical characterization of system(s) of interest, typically via random variables ‣ the model chosen typically depends on the task at hand Prediction: Estimate output given input Called classification when output is
categorical (stock price up or down) Called regression when output is
real-valued (actual stock price value) Models used: Linear Regression, Logistic
Regression, Support Vector Machines, … Machine Learning • A key infrastructural element in machine learning is a mathematical model ‣ A mathematical characterization of system(s) of interest, typically via random variables • Learning: estimating statistical model from data Machine Learning • A key infrastructural element in machine learning is a mathematical model ‣ A mathematical characterization of system(s) of interest, typically via random variables • Learning: estimating statistical model from data ‣ From “data” to “model” Machine Learning • A key infrastructural element in machine learning is a mathematical model ‣ A mathematical characterization of system(s) of interest, typically via random variables • Learning: estimating statistical model from data ‣ From “data” to “model” • Inference: using statistical model to infer properties of system(s) Machine Learning: Learning Training
Data Model Train Test Test
Data Model Evaluation Is the model good enough, is there enough data? Underfitting Overfitting scikit-learn.org Is the model good enough, is there enough data? Underfitting Overfitting scikit-learn.org What models allow us to do is generalize from data Different models generalize in different ways Simple model: “memorize” all data points, and report nearest neighbor Machine Learning: Inference • Inference: using statistical model to infer properties of system(s) ‣ From “model” to “answers” ‣ Predictive: e.g. most probable response value given covariates/inputs ‣ What is the (most probable) disease given systems ‣ Descriptive: e.g. groupings of variables Machine Learning: Inference • Inference: using statistical model to infer properties of system(s) ‣ From “model” to “answers” ‣ Predictive: e.g. most probable response value given covariates/inputs ‣ What is the (most probable) disease given systems ‣ Descriptive: e.g. groupings of variables ‣ Inferring communities of users in a social network Computational Aspects of ML • Mathematical part of ML: Specifying Models and resulting learning/inference Algorithms, understanding and analyzing properties of Models and Algorithms Computational Aspects of ML • Mathematical part of ML: Specifying Models and resulting learning/inference Algorithms, understanding and analyzing properties of Models and Algorithms • Computational part of ML: Performing learning/inference in a computationally tractable way • Model Learning and Inference can be cast as an optimization problem: ‣ minimize/maximize an objective function with respect to some parameters Computational Aspects of ML: Optimization • Mathematical part of ML: Specifying Models and resulting Estimators, understanding and analyzing properties of Models and Estimators • Computational part of ML: estimating the models from data, inference using the models in a • Model Estimation and Inference can be cast as an ‣ minimize/maximize an objective function with respect to some parameters ‣ Almost any inference task could be recast as solving an optimization problem — this is sometimes called the “variational viewpoint” of inferenceMaximum of objective surface (hill) is is red dot at top of hill Computational Aspects of ML • Mathematical part of ML: Specifying Models and resulting learning/inference Algorithms, understanding and analyzing properties of Models and Algorithms • Computational part of ML: Performing learning/inference in a computationally tractable way • Model Learning and Inference can be cast as an optimization problem: ‣ minimize/maximize an objective function with respect to some parameters Modern Machine Learning • (Parametric/Nonparametric Bayesian/Frequentist) Statistical Models Models under high-dimensional “Big-p” data settings Models for Complex Data Types Models with the additional facet of spatio-temporality Models with the additional facet of actions which feed back to
the model
Models with multiple interacting systems/agents Mathematical Models: Parametric vs Nonparametric • Parametric Models: “fixed-size” models that do not “grow” with the data ‣ More data just means you learn/fit the model better Fitting a simple line (2 params)
to a bunch of one-dim. samples Mathematical Models: Parametric vs Nonparametric • Parametric Models: “fixed-size” models that do not “grow” with the data ‣ More data just means you learn/fit the model better • Nonparametric Models: Models that grow with the data ‣ More data means a more complex model Statistical Models: Bayesian vs Frequentist • Mathematical models in ML typically described via random variables — in which case they are also called statistical models • Statistical models typically specified by unknown parameters (to be learnt from data) Statistical Models: Bayesian vs Frequentist • Mathematical models in ML typically described via random variables — in which case they are also called statistical models • Statistical models typically specified by unknown parameters (to be learnt from data) • Frequentist: there exist a “ground-truth” set of unknown parameters that are constant (i.e. not random) Statistical Models: Bayesian vs Frequentist • Mathematical models in ML typically described via random variables — in which case they are also called statistical models • Statistical models typically specified by unknown parameters (to be learnt from data) • Frequentist: there exist a “ground-truth” set of unknown parameters that are constant (i.e. not random) • Bayesian: model parameters are themselves random, and typically specified by their own distribution/statistical model, with their own unknown “hyper- parameters” Complex Data Types • Standard Machine Learning models work with inputs and outputs that are scalars or vectors in standard Euclidean space • But increasingly so, the data types of the inputs and outputs are not so simple anymore Complex Data Types Phylogenetic Trees [wikipedia] Parse Trees [wikipedia] Graph Structure [one-mind]
Complex Data Types
Permutations/Rankings
€ > SB hitos / www.google.com) #q=search senginestlp=1 oc: gs =|
\ Bisearch engines - Googie s
Google _ search engines ' Pradeep Raker + Shaw > I
Web 29° Vas m News Mores Seach tock, 219 2
Qogoile Web Search
were Cogple com
‘All the best search engines piled into one. Wes; | Images; | Video: [: News: | Local
White Pages. Search Results from: Google, Yahoo!, Yardex, And More
White Pages - Horoscope - Local - About Oogpile
Web search engine - Vikipedia, he free encyclopedia
oe whoeda onyahiWe search engine
Awad search engine is software code that 1 desgred to search for formation on the
Word Wide Web The saarch msuts are generally presented in a ine of
List of search engines - Wikipedia. the tree encyclopedia
on wipeda orp/atkyUst_of_search engines
Thee © 2 list of articles about search engines, mcluding web search engines.
Selection based search engines. metasearch engres. desstop search tools. and
wenm esearchenginelest Com
May 21, 2010 ~ The Search Engine List is the web's most comprehensive list of major
ard minor search engines complete with irks and abstracts descrbing —.
worm ema COM aces Search engines:
Here are the 15 Most Popular Search Engines ranked by a combination of constantly
Updated tratfic statistics
News for search engines
4
hy
Russia's search engine is moving up the rarks in the search engine
world ~ it now comes in fouth place after Googie, Basids, and Yahoo.
https Jerre inquickh com
Statistical Models for spatio-temporal data • Random vectors characterizing how a system varies as a function of time (and potentially space) ‣ Y(t) , X(s,t) • Also called random fields Statistical Models for spatio-temporal data • Random vectors characterizing how a system varies as a function of time (and potentially space) ‣ Y(t) , X(s,t) • Also called random fields Field in Physics: Some physical quantity associated with space-time Models: With Actions • Models for agents that take actions depending on current state ‣ these actions incur rewards, and affect future states (“feedback”) • Forms the subfield of Reinforcement Learning Reinforcement Learning Supervised learning mature [WEKA] For agents, reinforcement learning most appropriate Environment Agent πPolicy : S A action (a[t]) state (s[t]) reward (r[t+1]) Peter Stone (UT Austin) PRISM 3 Parametric Models: Frequentist Ensemble Methods Joydeep Ghosh, UT Austin Parametric Models, Optimization Evolving Neural Networks Risto Mikkulainen, UT Austin Parametric Models Deep Learning Yoshua Bengio, University of Montreal Parametric Models: High-dimensional Low-rank Matrices Sujay Sanghavi, UT Austin Parametric Models: Complex Data Ranking Ambuj Tewari, University of Michigan Non-parametric Models: Bayesian Nonparametric Bayesian Models Peter Mueller, UT Austin Models: With Actions Multi-step Prediction Richard Sutton, University of Alberta Models: Multiple Agents Multi-Agent Systems Peter Stone, UT Austin Optimization Convex Optimization Constantine Caramanis, UT Austin Applications Natural Language Processing Raymond Mooney, UT Austin Computer Vision Kristen Grauman, UT Austin Psychology Tal Yarkoni, UT Austin Robotics Peter Stone, UT Austin MLSS Austin - by the numbers • 84 from the US • 26 International • 9 Academics • 15 Industry • 86 Students • 110* total registrants Some Logistics • Lectures each day in two sessions: morning session from 9:00 - 12:00, and afternoon session from 2:00 - 5:00 • Lunch each day from 12:00 - 2:00 on your own, except Friday Jan 16 when we will be providing a boxed lunch • Poster Session (optional) from 2:00 - 5:00 on January 12th • Evenings on your own, our volunteers will lead a night out on Saturday January 10 (optional)