










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Car price prediction using machine learning
Typology: Assignments
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Page | 1 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ
Page | 2 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ
Page | 4 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ
2. Workflow of project We need data to feed our machine learning model. Once we have a car dataset, we need to preprocess the data. We cannot feed this raw data to our machine learning algorithm so for understanding the data we need to do some processing. This step is called the data preprocessing. We will split this data into training data and testing data. This step is called as train test split. Once we have split this data, we will train our machine learning algorithm with training data and test our machine learning model with this test data. We will feed this to our regression model. In this case we are going to use two regression models: First, we will train our machine learning model with the linear regression and then we will use the lasso regression model. Then we will compare the accuracy score or the error percentage of these models and see which model performs better for this dataset. This is a regression problem because we are predicting a particular value, so we are not just predicting classes, we are predicting particular value on which the car can be sold. This is known as regression problem, so we will be using linear regression and then lasso regression and compare them. Once we have trained our model with this training data, we will evaluate our model. After that we will have a trained linear regression or lasso regression model. Then we will feed new data to our machine learning model. So, operating on this data machine learning can predict the car price on which it can be sold.
Page | 5 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ Coding Part in Python
3. Creating files and installing libraries At first, we need to create new Python file that we are going to write our code. And also, we need a dataset as we talked about earlier. We should also install some libraries that we will need while writing codes:
Page | 7 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ In this table there is information of nearly 300 cars and we well use this information for our ML model. In the 13th^ line we inspect first 5 rows and print them. And the function in the 15th^ line gives us information about the number of rows and columns. These functions will have following output: As we see in the output, we have 301 rows and 9 columns including data about cars. So, what we are basically trying to do is training our machine learning model with all this selling prices and other features of the car. And once it has learned from this data, when we give new information like the kilometers driven, fuel type etc., the ML model can predict how much price the car can be sold. Let’s inspect more about this dataset:
Page | 8 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ This code will have the following output: Using the function given above we can also get information about missing values: We can see how many missing values are there in each column. As it seen we don’t have any missing values in our columns. Now let’s check the number of values for this categorical data. For example, in this fuel type we have petrol, diesel and compressed natural gas. So, we are going to check out of this three, not one entries. How many cars are petrol cars? How many cars are diesel cars etc. And how many cars as dealer as their seller type and how many individuals are there?
Page | 10 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ This will successfully encode our data into numerical values, so all these categorical data like petrol, diesel, individual or dealer will be transformed to respective categories or respective numerical values.
7. Splitting the data and Target The next step is splitting our original data set into training data and test data. Let's create 2 variables X&Y. This selling price becomes our target because we are just going to predict this target. In this data we need to remove this selling price and store it in Y and all the remaining features will come in this X variable, we also need to remove this car name because it cannot be used well for our prediction, so for our prediction we will be using the year on which it is bought, the kilometers it has driven the what is the present price of it fuel type and those kind of things.
Page | 11 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ So, this X contains all the values except the selling price. This selling price will be removed from our data, and it will be stored in the Y variable.
8. Splitting Training and Test data Now we need to split the data into training and test data. The price of all the values in X train will be stored in our train and all the price values for the X test will be stored in our test. So, now we need to use the train test split function we have imported. As we can see here, we have this train test split function which we have imported from the SKLEARN model selection. So, we’ll call this function.
Page | 13 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ So now we need to compare the values predicted by our model with the original values. Here the original values are nothing but Y train. Let's compare and see how our model is performing. For regression we use various metrics such as R-squared, error, mean absolute error, root mean, square error etc. And we will get the following output for R squared error: 9.3. Visualize the actual prices and predicted prices Another important way to understand about the performance of model is to plot the values predicted by our model. Now we are going to visualize the actual prices and predictive prices. So, these values should be as close as possible predicted prices. Let's run this:
Page | 14 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ Output: This will tell us how close the values are. As you can see here, there is not much distance between the values here, most of the values are closer to each other, so once the price value increases, there is some gap because the price of the cars are within the range 10. There are very less values greater than 10. As you can see here, the values are very close. So that means the values predicted by our machine learning model is very close to the original sold price. This is how we can estimate your model performance by plotting the prediction and the original values in the case of scatter plots. We have predicted for training data. Now let's predict it for test data.
Page | 16 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ
10. Lasso Regression 10.1. What is Lasso Regression? Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e., models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination. The acronym “LASSO” stands for L east A bsolute S hrinkage and S election O perator. Let's use lasso regression and see how this is performing. Generally, lasso regression performs better on most of the cases. Linear regression performs well on those cases which are directly correlated or positively correlated. For example, if one value increases, the other value also increases. Then in that case they are directly proportional rate. Now let's do the same thing with lasso regression. 10.2. Model evaluation Now let’s evaluate our model:
Page | 17 KEYFİYYƏTİN TƏMİNATI ŞÖBƏSİ The output for R squared error will be: 10.3. Visualize the actual prices and Predicted prices Let’s visualize this also: Output: