Statistical Learning and Modeling: Supervised Learning Techniques, Exams of Nursing

An overview of statistical learning and modeling, focusing on supervised learning techniques within the realm of artificial intelligence. It covers linear models for classification, including parameter optimization via maximum likelihood and least squares. The document also delves into fisher's linear discriminant and the perceptron algorithm. A significant portion is dedicated to boosting, particularly adaboost, explaining its concept, ensemble examples, and its position among the top 10 algorithms in data mining. Useful for understanding the theoretical underpinnings and practical applications of these machine learning methods. It also touches on probabilistic generative and discriminative models, logistic regression, and their applications in classification problems. Structured as a lecture or study material, providing a comprehensive overview of the covered topics.

Typology: Exams

2024/2025

Available from 08/23/2025

ecra-gideon
ecra-gideon 🇺🇸

146 documents

1 / 59

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Artificial Intelligence
Statistical Learning and Modeling: Supervised Learning
Fei Wu
College of Computer Science Zhejiang University
http://person.zju.edu.cn/wufei/
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b

Partial preview of the text

Download Statistical Learning and Modeling: Supervised Learning Techniques and more Exams Nursing in PDF only on Docsity!

Artificial Intelligence

Statistical Learning and Modeling: Supervised Learning

Fei Wu

College of Computer Science Zhejiang University

http://person.zju.edu.cn/wufei/

Outlines

Linear model for classification

Ada Boosting

Learning the parameters of Linear Discriminant Functions

  • Three approaches:
    • Least-squares approach:
      • making the model predictions as close as possible to a set of target values
    • Fisher‟s linear discriminant:
      • maximum class separation in the output space
    • The perceptron algorithm of Rosenblatt:
      • generalized linear model

Linear Basis Function Models

Parameter optimization via Maximum likelihood

  • Assume:
  • Thus:
  • For data set (^) X = { x 1 ,... , x N} and target vector t = (t 1 ,... , tN)T, the likelihoodfunction:

SSE: sum-of-squares

error function

Parameter optimization via Maximum likelihood

  • Solving w by Maximum likelihood:

N ×M design matrix

Moore-Penrose pseudo-inverse Φ† = (Φ𝑇Φ)− 1 Φ𝑇

  • Problem:

Parameter optimization via Least Square

- : - group together: -

  • Learning with training data set:
  • minimizing a sum-of-squares error function :
  • Discriminant function:

Maximum likelihood and least squares for linear regressionclassification

 Maximum likelihood estimation method (MLE)   The likelihood function indicates how likely the observed sample is as a function of possible parameter values. Therefore, maximizing the likelihood function determines the parameters that are most likely to produce the observed data. From a statistical point of view, MLE is usually recommended for large samples because it is versatile, applicable to most models and different types of data, and produces the most precise estimates.

 Least squares estimation method (LSE)   Least squares estimates are calculated by fitting a regression line to the points from a data set that has the minimal sum of the deviations squared (least square error). In reliability analysis, the line and the data are plotted on a probability plot.

In a linear model, if the errors belong to a normal distribution the least squaresestimators are also the maximum likelihood estimators.

Fisher‟s linear discriminant

  • The Fisher’s criterion: maximize the separation between the projected class means as wellas the inverse of the total within-class variance.

Generalized Rayleigh quotient

Between-class covariance matrix Within-class covariance matrix

  • Fisher’s linear discriminant:

Fisher‟s linear discriminant

𝑆−^1 𝑆 𝒘^ =^ λ𝐰 𝑤 𝐵

Fisher‟s discriminant for multiple classes

  • Assume input space dimensionality (^) D > K (number of classes, (^) K>2 ):

covariance matrices defined in the original x-space

  • Total covariance matrix:
  • The generalization of the within-class covariance matrix:
  • The generalization of the between-class covariance matrix:

Fisher‟s discriminant for multiple classes

  • Assume input space dimensionality (^) D > K ( K is the number of classes, (^) K>2 ):

covariance matrices defined in the projected y-space

  • The generalization of the within-class and between-class covariance matrix:
  • The Fisher’s criterion:

The perceptron algorithm

  • Stochastic gradient descent algorithm:
  • Perceptron convergence theorem:
    • If there exists an exact solution (in other words, if the training data set is linearly separable), then the perceptron learning algorithm is guaranteed to find an exactsolution in a finite number of steps.

The perceptron algorithm