



























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Feed-forward net: values of a node depend only on earlier layers,usually only on previous layer (“network architecture”). Linear Decision Boundaries, Nonlinear Transformation, Inputs, Nonlinear Test Statistics, Neural Networks, Multi Layer Perceptron, Probability Density Estimation, PDE, Kernel Based, Correlation, Independence, Decorrelation, Naive Bayes, Decision Trees, Boosting, AdaBoost, Overtraining, Statistical Data Analysis, Lecture Slides, Glen Cowan, Physics Department, University of Lon
Typology: Slides
1 / 35
This page cannot be seen from the preview
Don't miss anything!




























1 Probability, Bayes’ theorem, random variables, pdfs 2 Functions of r.v.s, expectation values, error propagation 3 Catalogue of pdfs 4 The Monte Carlo method 5 Statistical tests: general concepts 6 Test statistics, multivariate methods 7 Goodness-of-fit tests 8 Parameter estimation, maximum likelihood 9 More maximum likelihood 10 Method of least squares 11 Interval estimation, setting limits 12 Nuisance parameters, systematic uncertainties 13 Examples of Bayesian approach 14 tba
The optimal decision boundary may not be a hyperplane, → nonlinear test statistic accept
0
Multivariate statistical methods 1 are a Big Industry: Particle Physics can benefit from progress in Machine Learning. Neural Networks, Support Vector Machines, Kernel density methods, ...
Used in neurobiology, pattern recognition, financial forecasting, ... Here, neural nets are just a type of test statistic. Suppose we take t ( x ) to have the form logistic sigmoid This is called the single-layer perceptron. s (·) is monotonic → equivalent to linear t ( x )
Easy to generalize to arbitrary number of layers. Feed-forward net: values of a node depend only on earlier layers, usually only on previous layer (“network architecture”). More nodes → neural net gets closer to optimal t ( x ), but more parameters need to be determined. Parameters usually determined by minimizing an error function, where t (0) , t (1) are target values, e.g., 0 and 1 for logistic sigmoid. Expectation values replaced by averages of training data (e.g. MC). In general training can be difficult; standard software available.
Signal: e
e → W
W
e → qqgg (4 less well separated hadron jets) ← input variables based on jet structure, event shape, ... none by itself gives much separation. Neural network output does better... (Garrido, Juste and Martinez, ALEPH 96-144)
See e.g. K. Cranmer, Kernel Estimation in High Energy Physics , CPC 136 (2001) 198; hep-ex/0011057; T. Carli and B. Koblitz, A multi-variate discrimination technique based on range-searching , NIM A 501 (2003) 576; hep-ex/ Construct non-parametric estimators of the pdfs and use these to construct the likelihood ratio ( n -dimensional histogram is a brute force example of this.) More clever estimation techniques can get this to work for (somewhat) higher dimension.
Consider d dimensions, N training events, x 1 , ..., x N
estimate f ( x ) with Use e.g. Gaussian kernel: kernel bandwidth (smoothing parameter) Need to sum N terms to evaluate function (slow); faster algorithms only count events in vicinity of x ( k -nearest neighbor, range search).
Out of all the input variables, find the one for which with a single cut gives best improvement in signal purity: Example by MiniBooNE experiment, B. Roe et al., NIM 543 (2005) 577 where w i
. is the weight of the i th event. Resulting nodes classified as either signal/ background. Iterate until stop criterion reached based on e.g. purity or minimum number of events in a node. The set of cuts defines the decision boundary.
The level of separation within a node can, e.g., be quantified by the Gini coefficient , calculated from the (s or b) purity as: For a cut that splits a set of events a into subsets b and c, one can quantify the improvement in separation by the change in weighted Gini coefficients: where, e.g.,
scheme can use instead of Gini e.g. the misclassification rate: