Linear Decision Boundaries-Computing and Statistical Data Analysis-Lecture 06 Slides-Physics, Slides of Computational and Statistical Data Analysis

Feed-forward net: values of a node depend only on earlier layers,usually only on previous layer (“network architecture”). Linear Decision Boundaries, Nonlinear Transformation, Inputs, Nonlinear Test Statistics, Neural Networks, Multi Layer Perceptron, Probability Density Estimation, PDE, Kernel Based, Correlation, Independence, Decorrelation, Naive Bayes, Decision Trees, Boosting, AdaBoost, Overtraining, Statistical Data Analysis, Lecture Slides, Glen Cowan, Physics Department, University of Lon

Typology: Slides

2011/2012

Uploaded on 03/08/2012

leyllin
leyllin 🇬🇧

4.3

(15)

241 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
G. Cowan Lectures on Statistical Data Analysis 1
Statistical Data Analysis: Lecture 6
1 Probability, Bayes’ theorem, random variables, pdfs
2 Functions of r.v.s, expectation values, error propagation
3 Catalogue of pdfs
4 The Monte Carlo method
5 Statistical tests: general concepts
6 Test statistics, multivariate methods
7 Goodness-of-fit tests
8 Parameter estimation, maximum likelihood
9 More maximum likelihood
10 Method of least squares
11 Interval estimation, setting limits
12 Nuisance parameters, systematic uncertainties
13 Examples of Bayesian approach
14 tba
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download Linear Decision Boundaries-Computing and Statistical Data Analysis-Lecture 06 Slides-Physics and more Slides Computational and Statistical Data Analysis in PDF only on Docsity!

Statistical Data Analysis: Lecture 6

1 Probability, Bayes’ theorem, random variables, pdfs 2 Functions of r.v.s, expectation values, error propagation 3 Catalogue of pdfs 4 The Monte Carlo method 5 Statistical tests: general concepts 6 Test statistics, multivariate methods 7 Goodness-of-fit tests 8 Parameter estimation, maximum likelihood 9 More maximum likelihood 10 Method of least squares 11 Interval estimation, setting limits 12 Nuisance parameters, systematic uncertainties 13 Examples of Bayesian approach 14 tba

Nonlinear test statistics

The optimal decision boundary may not be a hyperplane, → nonlinear test statistic accept

H

0

H

Multivariate statistical methods 1 are a Big Industry: Particle Physics can benefit from progress in Machine Learning. Neural Networks, Support Vector Machines, Kernel density methods, ...

Introduction to neural networks

Used in neurobiology, pattern recognition, financial forecasting, ... Here, neural nets are just a type of test statistic. Suppose we take t ( x ) to have the form logistic sigmoid This is called the single-layer perceptron. s (·) is monotonic → equivalent to linear t ( x )

Neural network discussion

Easy to generalize to arbitrary number of layers. Feed-forward net: values of a node depend only on earlier layers, usually only on previous layer (“network architecture”). More nodes → neural net gets closer to optimal t ( x ), but more parameters need to be determined. Parameters usually determined by minimizing an error function, where t (0) , t (1) are target values, e.g., 0 and 1 for logistic sigmoid. Expectation values replaced by averages of training data (e.g. MC). In general training can be difficult; standard software available.

Neural network example from LEP II

Signal: e

e → W

W

  • (often 4 well separated hadron jets) Background: e

e → qqgg (4 less well separated hadron jets) ← input variables based on jet structure, event shape, ... none by itself gives much separation. Neural network output does better... (Garrido, Juste and Martinez, ALEPH 96-144)

Probability Density Estimation (PDE) techniques

See e.g. K. Cranmer, Kernel Estimation in High Energy Physics , CPC 136 (2001) 198; hep-ex/0011057; T. Carli and B. Koblitz, A multi-variate discrimination technique based on range-searching , NIM A 501 (2003) 576; hep-ex/ Construct non-parametric estimators of the pdfs and use these to construct the likelihood ratio ( n -dimensional histogram is a brute force example of this.) More clever estimation techniques can get this to work for (somewhat) higher dimension.

Kernel-based PDE (KDE, Parzen window)

Consider d dimensions, N training events, x 1 , ..., x N

estimate f ( x ) with Use e.g. Gaussian kernel: kernel bandwidth (smoothing parameter) Need to sum N terms to evaluate function (slow); faster algorithms only count events in vicinity of x ( k -nearest neighbor, range search).

Decision trees

Out of all the input variables, find the one for which with a single cut gives best improvement in signal purity: Example by MiniBooNE experiment, B. Roe et al., NIM 543 (2005) 577 where w i

. is the weight of the i th event. Resulting nodes classified as either signal/ background. Iterate until stop criterion reached based on e.g. purity or minimum number of events in a node. The set of cuts defines the decision boundary.

Finding the best single cut

The level of separation within a node can, e.g., be quantified by the Gini coefficient , calculated from the (s or b) purity as: For a cut that splits a set of events a into subsets b and c, one can quantify the improvement in separation by the change in weighted Gini coefficients: where, e.g.,

Choose e.g. the cut to the maximize Δ; a variant of this

scheme can use instead of Gini e.g. the misclassification rate: