Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Classification Model Application in Real-World Scenarios: A Case Study of House Selection , Assignments of Mathematical Methods

Georgia Institute of Technology - Main Campus Mathematical Methods

Prof. Joel Sokol

A practical application of classification models in a real-world scenario. The author uses the example of house selection to illustrate the process of identifying relevant predictors, building a classification model using support vector machines (svm) and k-nearest neighbors (knn), and evaluating model performance. A clear and concise explanation of the concepts and techniques involved, making it a valuable resource for students learning about classification models.

Typology: Assignments

2023/2024

Uploaded on 02/14/2025

wanglihui04 🇺🇸

15 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

Classification

Homework 1

ISYE 6501

Fall 2024

Contents

2.1 2

2 2

2.2.1 ..................................................... 2

2.2.2 ..................................................... 3

2.2.3 ..................................................... 4

Appendix 6

2.2.1 ..................................................... 6

2.2.2 ..................................................... 6

2.2.3 ..................................................... 6

Bibliography 7

Discover Assignments of Mathematical Methods Georgia Institute of Technology - Main Campus

Partial preview of the text

Download Classification Model Application in Real-World Scenarios: A Case Study of House Selection and more Assignments Mathematical Methods in PDF only on Docsity!

Classification

 - Homework - ISYE - Fall

2.1 Contents
- 2.2.1
- 2.2.2
- 2.2.3
Appendix
- 2.2.1
- 2.2.2
- 2.2.3
Bibliography

Describe a situation or problem from your job, everyday life, current events, etc., for which a

classification model would be appropriate. List some (up to 5) predictors that you might use.

A situation from my job where a classification model would be appropriate would be actually a couple years

ago when my family was looking to buy a house. We actually based it on a couple of things such as the

following:

Predictors Definition

Bathroom >2.5 bathrooms to accommodate guests

Bedroom >3 bedrooms to accommodate guests

Backyard Nice view and to accommodate dog

HOA <$300 because it can get expensive

Safe Neighborhood Parents being older is a priority

Table 1: Predictors for dream house

We found a house that met all our criteria. It has 4 bedrooms, 3 bathrooms, a view of the golf course, and a

spacious backyard where my poodle can run around. The HOA fees are also in the $100s. The neighborhood

is also very safe, with patrols biweekly and had neighborhood watch. It was a fantastic purchase, but now I

wonder if I could of used Trulia data to find more potential houses using classification we could of checked

out instead of just going to every single house we saw open.

2.2.

Using the support vector machine function ksvm contained in the R package kernlab, find a

good classifier for this data. Show the equation of your classifier, and how well it classifies the

data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that

topic soon.)

Using the ksvm function from the kernlab library, I experimented with various C values to find the optimal

one for my model. I identified two potential values, C = 0.1 and C = 100, both showing the same accuracy

yet ranging in C. I was initially really confused in the understanding of C because I originally thought C

would represent the λ in the formula of SVM being:

min

a 0 ,...,an

∑^ n

i =

max (0 , 1 − (

∑^ n

j =

aj xij + a 0 ) yi ) + λ

∑^ n

j =

( aj )

I like to simplify the formula to the following for ease:

SV M = M in ( Error ) + λ ( M argin )

However, the actuality is the opposite. A higher λ in the formula yields an importance to prioritizing a

higher margin over the error which can be good for predicting new variables while a lower λ prioritizes a

smaller margin and thus focuses on minimizing the errors but its important to pick the optimal C because

it can lead to over fitting the data.

2.2.

Using the k-nearest-neighbors classification function kknn contained in the R kknn package,

suggest a good value of k, and show how well it classifies that data points in the full data set.

Don’t forget to scale the data (scale=TRUE in kknn)

Using the kknn function from the kknn package, I began by splitting my credit card score data into training

and test datasets. It’s generally recommended to split the data to 80% for the training data and to use the

remaining for the testing. I used approximately 500 data points for training and the rest for testing.

Next, I ran a for loop that selected a random sample of K values ranging from 1 to 500, choosing 100 values

at a time to iterate through. I noticed that the accuracy seemed to be higher as the number of K decreased,so

I narrowed the range to 0-10, which helped me approximate the optimal K value to be 7. This value achieved

a 90.08% prediction accuracy.

While experimenting with the function, I noticed that changing the kernel to “optimal” indicated a different

optimal K value which was 13. Upon reviewing the documentation, I learned that the “optimal” kernel

weighs the fraction by distance, whereas the “rectangular” kernel, which was recommended by the TA, uses

a simple fraction. Given that several K values produced similar predictions, I opted for the lowest K value

because I wanted to have the simplest model and to prevent model complexity and potential over fitting of

the data. I saved all the outputs from my for loop and presented them in the following table and graph:

KNN Prediction %

Table 4: KNN Accuracy %

0 100 200 300 400 500

KNN

Fit %

Bibliography

(1) “Kknn: Weighted k-Nearest Neighbor Classifier.” RDocumentation, www.rdocumentation.org/packages/kknn/versions/1.3.1/topics/kknn. Accessed 26 Aug. 2024.

(2) "KSVM: Support Vector Machines. RDocumentation." RDocumentation, https://www.rdocumentation.org/packages/kernlab/versions/0.9- 33/topics/ksvm

(3) Piazza • Ask. Answer. Explore. Whenever. Private questions http://www.piazza.com. Accessed 27 Aug. 2024.

Classification Model Application in Real-World Scenarios: A Case Study of House Selection , Assignments of Mathematical Methods

Related documents

Partial preview of the text

Download Classification Model Application in Real-World Scenarios: A Case Study of House Selection and more Assignments Mathematical Methods in PDF only on Docsity!

Classification

Describe a situation or problem from your job, everyday life, current events, etc., for which a

classification model would be appropriate. List some (up to 5) predictors that you might use.

A situation from my job where a classification model would be appropriate would be actually a couple years

ago when my family was looking to buy a house. We actually based it on a couple of things such as the

following:

Predictors Definition

Bathroom >2.5 bathrooms to accommodate guests

Bedroom >3 bedrooms to accommodate guests

Backyard Nice view and to accommodate dog

HOA <$300 because it can get expensive

Safe Neighborhood Parents being older is a priority

Table 1: Predictors for dream house

We found a house that met all our criteria. It has 4 bedrooms, 3 bathrooms, a view of the golf course, and a

spacious backyard where my poodle can run around. The HOA fees are also in the $100s. The neighborhood

is also very safe, with patrols biweekly and had neighborhood watch. It was a fantastic purchase, but now I

wonder if I could of used Trulia data to find more potential houses using classification we could of checked

out instead of just going to every single house we saw open.

Using the support vector machine function ksvm contained in the R package kernlab, find a

good classifier for this data. Show the equation of your classifier, and how well it classifies the

data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that

topic soon.)

Using the ksvm function from the kernlab library, I experimented with various C values to find the optimal

one for my model. I identified two potential values, C = 0.1 and C = 100, both showing the same accuracy

yet ranging in C. I was initially really confused in the understanding of C because I originally thought C

would represent the λ in the formula of SVM being:

min

∑^ n

max (0 , 1 − (

∑^ n

aj xij + a 0 ) yi ) + λ

∑^ n

( aj )

I like to simplify the formula to the following for ease:

SV M = M in ( Error ) + λ ( M argin )

However, the actuality is the opposite. A higher λ in the formula yields an importance to prioritizing a

higher margin over the error which can be good for predicting new variables while a lower λ prioritizes a

smaller margin and thus focuses on minimizing the errors but its important to pick the optimal C because

it can lead to over fitting the data.

Using the k-nearest-neighbors classification function kknn contained in the R kknn package,

suggest a good value of k, and show how well it classifies that data points in the full data set.

Don’t forget to scale the data (scale=TRUE in kknn)

Using the kknn function from the kknn package, I began by splitting my credit card score data into training

and test datasets. It’s generally recommended to split the data to 80% for the training data and to use the

remaining for the testing. I used approximately 500 data points for training and the rest for testing.

Next, I ran a for loop that selected a random sample of K values ranging from 1 to 500, choosing 100 values

at a time to iterate through. I noticed that the accuracy seemed to be higher as the number of K decreased,so

I narrowed the range to 0-10, which helped me approximate the optimal K value to be 7. This value achieved

a 90.08% prediction accuracy.

While experimenting with the function, I noticed that changing the kernel to “optimal” indicated a different

optimal K value which was 13. Upon reviewing the documentation, I learned that the “optimal” kernel

weighs the fraction by distance, whereas the “rectangular” kernel, which was recommended by the TA, uses

a simple fraction. Given that several K values produced similar predictions, I opted for the lowest K value

because I wanted to have the simplest model and to prevent model complexity and potential over fitting of

the data. I saved all the outputs from my for loop and presented them in the following table and graph:

KNN Prediction %

Table 4: KNN Accuracy %

Bibliography