Advanced data analysis, Papers of Data Analysis & Statistical Methods

In these documents, you can find how to prepare models of linear regression, logistical regression, factor analysis and more statistical methods. Using jamovi software.

Typology: Papers

2020/2021

Uploaded on 03/25/2023

ruzena-smrckova
ruzena-smrckova 🇨🇿

1 document

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Růžena Smrčková, Logistic regression – HW1, FSV UK 2022/2023
Data file and preparation
Social_Network_Ads dataset consists of information about salary, age,
gender, the id number of respondent and information on whether he or
she will purchase an SUV car or not.
The goal of this paper is to develop a logistic regression model to
predict if a person of a certain age, gender and salary will buy SUVs.
For the logistic regression model, we need a dichotomy dependent
variable, which is in our case
variable purchased with values 0 (didn’t buy an SUV) and 1 (did buy an
SUV). In this model, we will further operate with variables age, salary in
$ (scale variables) and gender (nominal variable) as predictors.
Variable age was described as a nominal, which is incorrect and
therefore It was changed into a scale variable.
pf3
pf4
pf5
pf8

Partial preview of the text

Download Advanced data analysis and more Papers Data Analysis & Statistical Methods in PDF only on Docsity!

Data file and preparation Social_Network_Ads dataset consists of information about salary, age, gender, the id number of respondent and information on whether he or she will purchase an SUV car or not. The goal of this paper is to develop a logistic regression model to predict if a person of a certain age, gender and salary will buy SUVs. For the logistic regression model, we need a dichotomy dependent variable, which is in our case variable purchased with values 0 (didn’t buy an SUV) and 1 (did buy an SUV). In this model, we will further operate with variables age, salary in $ (scale variables) and gender (nominal variable) as predictors. Variable age was described as a nominal, which is incorrect and therefore It was changed into a scale variable.

In the table descriptive above we can see that 257 respondents did not buy the SUV and 157 buy the SUV. We can also see that in this data set is gathered 400 cases – 204 women and 196 men (as can be seen above in Contingency tables). As referred above the purchased variable consists of 0 value didn’t purchase and 1 did purchase. For better model fit it needs to reverse these values, considering that odds ratios are below 1 (Table model coefficients – Odds ratio).

So, to sum up. The older you are the higher odds of purchasing the SUV. For example, the 18-year-old person has 30,02408822 x times higher odds of purchasing the SUV. And the 60-year-old person has 84120,03114 x times higher odds of purchasing the SUV. BLR – Age, gender and salary After adding age as a predictor and find out its importance it is time to use more predictor for our model. So it is convenient to try enter gender and salary as both nominal and scale types of variables to this logistic regression model. From Model Fit table we can see pseudo R^2 equals 0,630, which I dare interpret as a strong relation between age, salary, gender and information if the given person would purchase an SUV. However due to Omnibus Likelihood ratio table we need to remove gender from our model, because it is not statistically significant. So, after removing gender as a variable, we can still see a strong relationship between purchasing SUV, age and salary, to be concrete pseudo R^2 equals 0,628. According to the Omnibus Likelihood ratio , age and salary are statistically significant, so it means that we can continue with the development of our model.

From the Odds ratio column in the Model Coefficients table, we can see that age is a bit more important than salary. As odds ratio for salary equals 1 and the odds ratio for age equals 1,26. To interpret the Salary and odds to buy an SUV. From CI can be seen there is no impact for purchasing due to the odds ratio equals exactly 1 because the power of 1 has only one result and it is 1. To illustrate this claim, see the computation below: The mean salary in a given dataset is 69743 $ and to compute if salary has an impact on purchasing SUV, we need to use this formula - odds ratio (1 in our case) power of chosen salary to compare with constant. So, in our case, it would be 1^69743 which equals 1. To compare Age as a variable in this and the previous BRL models we can use the odds ratio - 1,21 in the previous model and 1,26 in this model. So by adding salary we further illustrated how the variable affects age and impacts purchasing of SUVs in our sample. To interpret the results, we can reply to the process of exponential: And the result for the youngest respondent is 66.2874. For the oldest respondent, the results equal 1178791,124. So, to sum up. The older you are the higher odds of purchasing the SUV. For example, the 18-year-old person has 66,2874 x times higher odds of purchasing the SUV. And the 60-year-old person has 1178791, x times higher odds of purchasing the SUV. To use the results of this model on the population we need to use CI. And we can see that the values range is from 1.2 to 1.33.

Improvements for next time? To improve this logistic regression model, I would try to gather more data so that variable gender could enter the given model. Mainly because gender as a factor for purchasing a car could be important from the commons sense point of view. To fit this model more in the EU environment I would like to use the euro as currency and also add a variable called Urbanization to see if the size of the city matters for purchasing a SUV. And last but not least I would add variables Family and Number of children. Because SUV is, in my opinion, that type of car, which would be more purchased by families. I am also not sure if made the computations correctly due to the high numbers that came about as a result. However, i used this formula in excel: and the result is consistent with the Probability curve.