Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Nonparametric Classification: Modeling Probabilities & Density for Categorical Variables, Study notes of Engineering

University of Kansas (KU)Engineering

Nonparametric classification techniques for modeling categorical variables and predicting a categorical response variable, g, from one or more predictor variables, x. Two approaches: modeling probability density functions for each group and directly modeling probabilities. It also provides examples using the rhomaa-umaa dataset and techniques such as nearest-neighbor classification and kernel density estimation.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-7ug-1 🇺🇸

9 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

NONPARAMETRIC CLASSIFICATION

TECHNIQUES

C&PE 940, 30 November 2005

Geoff Bohling

Assistant Scientist

Kansas Geological Survey

[email protected]

864-2093

Overheads and other resources available at:

http://people.ku.edu/~gbohling/cpe940

Discover Study notes of Engineering University of Kansas (KU)

Partial preview of the text

Download Nonparametric Classification: Modeling Probabilities & Density for Categorical Variables and more Study notes Engineering in PDF only on Docsity!

NONPARAMETRIC CLASSIFICATION

TECHNIQUES

C&PE 940, 30 November 2005

Geoff Bohling Assistant Scientist Kansas Geological Survey [email protected] 864-

Overheads and other resources available at:

http://people.ku.edu/~gbohling/cpe

Modeling Categorical Variables: Classification

In classification applications we are trying to develop a model for predicting a categorical response variable, G , from one or more predictor variables, X. That is, if we know that an observation arises from one of K different mutually exclusive classes or groups, Gk , then we are trying to estimate the probability of occurrence of each group at each point in the predictor space

P ˆ k ( x ) = Prob( G (^) k X = x )

We could then assign each estimation point to the class with the highest probability at that point, segmenting the predictor space into regions assigned to the different classes.

The fact that the probabilities are continuous variables immediately suggests the possibility of using the nonparametric regression techniques we discussed last time to model them. However, we

will have to apply transforms to ensure that the estimated Pk ( x )

values represent a legitimate set of probabilities for mutually exclusive events, meaning

0 ≤ Pk ( x ) ≤ 1 , k = 1 ,K, K

( ) 1 1

∑^ =

Pk x.

To use this approach, the group memberships in the training dataset are coded as a set of indicator variables, one for each

group, with yk (^) , i = 1 for the group to which data point i belongs and

0 for all other groups – the set of probabilities corresponding to our certain knowledge of the group membership for each training data point.

Example Data

We will look at predicting facies from logs for a Cretaceous section in north central Kansas:

Facies assignments from core are available from the Jones well, along with a suite of logs including neutron and density porosity, photoelectric factor, and thorium, uranium and potassium components of the spectral gamma ray log. We will recast the density porosity and apparent matrix density (Rhomaa) and the photoelectric factor as apparent matrix volumetric photoelectric absorption (Umaa), so that the six logs employed for discrimination are: Th, U, K, Rhomaa, Umaa, and φN. The six facies picked from core are marine, paralic, floodplain, channel, splay, and paleosol.

So, we will train on this data from the Jones well and look at predictions both in the Jones well and the Kenyon well.

Nearest-Neighbor Classification

This is basically the same as nearest neighbor averaging, except that instead of taking the average of y values at neighboring points, we are assigning the class at each point based on a majority vote of the class memberships at neighboring points. Alternatively, one can compute a vector of group membership probabilities by dividing the count for each group by the total number of neighboring points. As usual, we would want to scale the predictor variables to comparable ranges before computing distances to neighboring points. Applying this approach to the two-facies Rhomaa-Umaa example and predicting over a grid of Rhomaa- Umaa values using 15 nearest neighbors, we get the following map of probability of membership in the marine facies:

Applying nearest-neighbor classification with 20 neighbors to the entire Jones well dataset, using all six logs and predicting all six facies, we get the following table of predicted (columns) versus actual (rows) facies:

Core Predicted Facies Facies (^) Marine Paralic Floodplain Channel Splay Paleosol Marine 112 2 8 0 0 6 Paralic 2 110 19 19 3 3 Floodplain 5 7 206 7 4 3 Channel 0 1 5 254 2 0 Splay 0 0 21 2 15 0 Paleosol 2 5 25 0 0 34

Overall, 82% of the intervals are classified correctly. The predicted facies versus depth look a little noisy:

Kernel Density Estimation

This basically uses the same procedure as kernel regression, using kernel basis functions to produce smooth estimates of the group-

specific density functions, f^ ˆ k^ ( X ). Here the kernel function are not

serving to smooth out the y values, but are in a sense spreading out the location of each training data point in predictor space, turning a delta (spike) function at its exact location into a Gaussian (or similar) curve centered at that location. The kernel density estimate at each location is basically a sum of the surrounding kernels rescaled to represent a density estimate. In general, we would have to estimate fully multivariate density functions, using high-dimensional kernel functions. However, if we make the simplifying assumption of independence among the variables, then the multivariate density function for each group is given by the product of the density functions of the individual variables for that group:

f (^) k ( X ) = fk ( X 1 ) ⋅ fk ( X 2 ) L fk ( Xd )

This simplification means that we only need to develop one- dimensional density function estimates. Plugging these density estimates into Bayes’ theorem leads to a method called “naïve Bayes”.

Applying the naïve Bayes approach to the Rhomaa-Umaa example, we get the following density estimates for each group – multiplying the one-dimensional kernel density estimates along each axis to get the combined density estimates.

Note that the Rhomaa axis in these 3D plots is pointed in the opposite direction than the Rhomaa axis in the 2D plots, so the marine cluster is plotting towards the back rather than towards the front like it should.

Combining the density estimates for the two groups in Bayes theorem give the following probability for the Marine facies:

Or, as a contour plot (oriented the right way):

Applying the naïve Bayes procedure to the full Jones dataset we get the following table of results, with a 72% correct classification rate overall:

Core Predicted Facies Facies Marine Paralic Floodplain Channel Splay Paleosol Marine 116 0 6 0 0 6 Paralic 4 113 18 8 3 10 Floodplain 20 13 156 0 16 27 Channel 0 84 1 174 3 0 Splay 0 1 9 0 28 0 Paleosol 4 5 10 0 0 47

The predicted facies versus depth look like:

Here is a schematic representation of the classification network for our example (leaving the bias nodes out of the picture):

The exponential curves in the output nodes represent the softmax transform.

For d input variables, M hidden-layer nodes ( M +1 including the bias node), and K output classes, the number of weights in this network is M ⋅ ( d + 1 ) + K ⋅( M + 1 ). Again, we can use any of a

number of optimization algorithms to adjust the weights. And, just as for the regression problem, we can include a weight decay term in the objective function, forcing a smoother representation of the boundaries between classes than we would obtain otherwise.

Using a single hidden-layer node for the Rhomaa-Umaa example, the sigmoid basis simply forms a step or boundary between the paralic and marine data points. I’ve included those points on the plot in their indicator form – 0’s for paralic and 1’s for marine:

Since the classification problem is about drawing boundaries, the sigmoid basis functions are in a sense more natural to this problem than to the regression problem, since the step in the basis function forms a boundary.

Fitting a network with 20 hidden-layer nodes to the entire Jones dataset using a decay constant of 0.1 (just guessing at reasonable values for the tuning parameters), yields the following classification table, with 93% of the facies predicted correctly:

Core Predicted Facies Facies (^) Marine Paralic Floodplain Channel Splay Paleosol Marine 124 1 1 0 0 2 Paralic 0 140 5 9 0 2 Floodplain 0 9 216 2 2 3 Channel 0 4 2 256 0 0 Splay 0 2 8 0 28 0 Paleosol 0 2 7 0 0 57

And the predictions versus depth look like:

The predictions in the Kenyon are still quite messy:

Nonparametric Classification: Modeling Probabilities & Density for Categorical Variables, Study notes of Engineering

Related documents

Partial preview of the text

Download Nonparametric Classification: Modeling Probabilities & Density for Categorical Variables and more Study notes Engineering in PDF only on Docsity!

NONPARAMETRIC CLASSIFICATION

TECHNIQUES

C&PE 940, 30 November 2005

∑^ =