Classification Model Application in Real-World Scenarios: A Case Study of House Selection , Assignments of Mathematical Methods

A practical application of classification models in a real-world scenario. The author uses the example of house selection to illustrate the process of identifying relevant predictors, building a classification model using support vector machines (svm) and k-nearest neighbors (knn), and evaluating model performance. A clear and concise explanation of the concepts and techniques involved, making it a valuable resource for students learning about classification models.

Typology: Assignments

2023/2024

Uploaded on 02/14/2025

wanglihui04
wanglihui04 🇺🇸

15 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Classification
Homework 1
ISYE 6501
Fall 2024
Contents
2.1 2
2 2
2.2.1 ..................................................... 2
2.2.2 ..................................................... 3
2.2.3 ..................................................... 4
Appendix 6
2.2.1 ..................................................... 6
2.2.2 ..................................................... 6
2.2.3 ..................................................... 6
Bibliography 7
1
pf3
pf4
pf5

Partial preview of the text

Download Classification Model Application in Real-World Scenarios: A Case Study of House Selection and more Assignments Mathematical Methods in PDF only on Docsity!

Classification

 - Homework - ISYE - Fall 
  • 2.1 Contents
    • 2.2.1
    • 2.2.2
    • 2.2.3
  • Appendix
    • 2.2.1
    • 2.2.2
    • 2.2.3
  • Bibliography

Describe a situation or problem from your job, everyday life, current events, etc., for which a

classification model would be appropriate. List some (up to 5) predictors that you might use.

A situation from my job where a classification model would be appropriate would be actually a couple years

ago when my family was looking to buy a house. We actually based it on a couple of things such as the

following:

Predictors Definition

Bathroom >2.5 bathrooms to accommodate guests

Bedroom >3 bedrooms to accommodate guests

Backyard Nice view and to accommodate dog

HOA <$300 because it can get expensive

Safe Neighborhood Parents being older is a priority

Table 1: Predictors for dream house

We found a house that met all our criteria. It has 4 bedrooms, 3 bathrooms, a view of the golf course, and a

spacious backyard where my poodle can run around. The HOA fees are also in the $100s. The neighborhood

is also very safe, with patrols biweekly and had neighborhood watch. It was a fantastic purchase, but now I

wonder if I could of used Trulia data to find more potential houses using classification we could of checked

out instead of just going to every single house we saw open.

2.2.

Using the support vector machine function ksvm contained in the R package kernlab, find a

good classifier for this data. Show the equation of your classifier, and how well it classifies the

data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that

topic soon.)

Using the ksvm function from the kernlab library, I experimented with various C values to find the optimal

one for my model. I identified two potential values, C = 0.1 and C = 100, both showing the same accuracy

yet ranging in C. I was initially really confused in the understanding of C because I originally thought C

would represent the λ in the formula of SVM being:

min

a 0 ,...,an

∑^ n

i =

max (0 , 1 − (

∑^ n

j =

aj xij + a 0 ) yi ) + λ

∑^ n

j =

( aj )

I like to simplify the formula to the following for ease:

SV M = M in ( Error ) + λ ( M argin )

However, the actuality is the opposite. A higher λ in the formula yields an importance to prioritizing a

higher margin over the error which can be good for predicting new variables while a lower λ prioritizes a

smaller margin and thus focuses on minimizing the errors but its important to pick the optimal C because

it can lead to over fitting the data.

2.2.

Using the k-nearest-neighbors classification function kknn contained in the R kknn package,

suggest a good value of k, and show how well it classifies that data points in the full data set.

Don’t forget to scale the data (scale=TRUE in kknn)

Using the kknn function from the kknn package, I began by splitting my credit card score data into training

and test datasets. It’s generally recommended to split the data to 80% for the training data and to use the

remaining for the testing. I used approximately 500 data points for training and the rest for testing.

Next, I ran a for loop that selected a random sample of K values ranging from 1 to 500, choosing 100 values

at a time to iterate through. I noticed that the accuracy seemed to be higher as the number of K decreased,so

I narrowed the range to 0-10, which helped me approximate the optimal K value to be 7. This value achieved

a 90.08% prediction accuracy.

While experimenting with the function, I noticed that changing the kernel to “optimal” indicated a different

optimal K value which was 13. Upon reviewing the documentation, I learned that the “optimal” kernel

weighs the fraction by distance, whereas the “rectangular” kernel, which was recommended by the TA, uses

a simple fraction. Given that several K values produced similar predictions, I opted for the lowest K value

because I wanted to have the simplest model and to prevent model complexity and potential over fitting of

the data. I saved all the outputs from my for loop and presented them in the following table and graph:

KNN Prediction %

Table 4: KNN Accuracy %

0 100 200 300 400 500

65

70

75

80

85

90

KNN

Fit %

Bibliography

(1) “Kknn: Weighted k-Nearest Neighbor Classifier.” RDocumentation, www.rdocumentation.org/packages/kknn/versions/1.3.1/topics/kknn. Accessed 26 Aug. 2024.

(2) "KSVM: Support Vector Machines. RDocumentation." RDocumentation, https://www.rdocumentation.org/packages/kernlab/versions/0.9- 33/topics/ksvm

(3) Piazza • Ask. Answer. Explore. Whenever. Private questions http://www.piazza.com. Accessed 27 Aug. 2024.