Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Advanced data analysis, Papers of Data Analysis & Statistical Methods

Charles University Prague Data Analysis & Statistical Methods

In these documents, you can find how to prepare models of linear regression, logistical regression, factor analysis and more statistical methods. Using jamovi software.

Typology: Papers

2020/2021

Uploaded on 03/25/2023

ruzena-smrckova 🇨🇿

1 document

1 / 8

This page cannot be seen from the preview

Don't miss anything!

Růžena Smrčková, Logistic regression – HW1, FSV UK 2022/2023

Data file and preparation

Social_Network_Ads dataset consists of information about salary, age,

gender, the id number of respondent and information on whether he or

she will purchase an SUV car or not.

The goal of this paper is to develop a logistic regression model to

predict if a person of a certain age, gender and salary will buy SUVs.

For the logistic regression model, we need a dichotomy dependent

variable, which is in our case

variable purchased with values 0 (didn’t buy an SUV) and 1 (did buy an

SUV). In this model, we will further operate with variables age, salary in

$ (scale variables) and gender (nominal variable) as predictors.

Variable age was described as a nominal, which is incorrect and

therefore It was changed into a scale variable.

Discover Papers of Data Analysis & Statistical Methods Charles University Prague

Partial preview of the text

Download Advanced data analysis and more Papers Data Analysis & Statistical Methods in PDF only on Docsity!

Data file and preparation Social_Network_Ads dataset consists of information about salary, age, gender, the id number of respondent and information on whether he or she will purchase an SUV car or not. The goal of this paper is to develop a logistic regression model to predict if a person of a certain age, gender and salary will buy SUVs. For the logistic regression model, we need a dichotomy dependent variable, which is in our case variable purchased with values 0 (didn’t buy an SUV) and 1 (did buy an SUV). In this model, we will further operate with variables age, salary in $ (scale variables) and gender (nominal variable) as predictors. Variable age was described as a nominal, which is incorrect and therefore It was changed into a scale variable.

In the table descriptive above we can see that 257 respondents did not buy the SUV and 157 buy the SUV. We can also see that in this data set is gathered 400 cases – 204 women and 196 men (as can be seen above in Contingency tables). As referred above the purchased variable consists of 0 value didn’t purchase and 1 did purchase. For better model fit it needs to reverse these values, considering that odds ratios are below 1 (Table model coefficients – Odds ratio).

So, to sum up. The older you are the higher odds of purchasing the SUV. For example, the 18-year-old person has 30,02408822 x times higher odds of purchasing the SUV. And the 60-year-old person has 84120,03114 x times higher odds of purchasing the SUV. BLR – Age, gender and salary After adding age as a predictor and find out its importance it is time to use more predictor for our model. So it is convenient to try enter gender and salary as both nominal and scale types of variables to this logistic regression model. From Model Fit table we can see pseudo R^2 equals 0,630, which I dare interpret as a strong relation between age, salary, gender and information if the given person would purchase an SUV. However due to Omnibus Likelihood ratio table we need to remove gender from our model, because it is not statistically significant. So, after removing gender as a variable, we can still see a strong relationship between purchasing SUV, age and salary, to be concrete pseudo R^2 equals 0,628. According to the Omnibus Likelihood ratio , age and salary are statistically significant, so it means that we can continue with the development of our model.

From the Odds ratio column in the Model Coefficients table, we can see that age is a bit more important than salary. As odds ratio for salary equals 1 and the odds ratio for age equals 1,26. To interpret the Salary and odds to buy an SUV. From CI can be seen there is no impact for purchasing due to the odds ratio equals exactly 1 because the power of 1 has only one result and it is 1. To illustrate this claim, see the computation below: The mean salary in a given dataset is 69743 $ and to compute if salary has an impact on purchasing SUV, we need to use this formula - odds ratio (1 in our case) power of chosen salary to compare with constant. So, in our case, it would be 1^69743 which equals 1. To compare Age as a variable in this and the previous BRL models we can use the odds ratio - 1,21 in the previous model and 1,26 in this model. So by adding salary we further illustrated how the variable affects age and impacts purchasing of SUVs in our sample. To interpret the results, we can reply to the process of exponential: And the result for the youngest respondent is 66.2874. For the oldest respondent, the results equal 1178791,124. So, to sum up. The older you are the higher odds of purchasing the SUV. For example, the 18-year-old person has 66,2874 x times higher odds of purchasing the SUV. And the 60-year-old person has 1178791, x times higher odds of purchasing the SUV. To use the results of this model on the population we need to use CI. And we can see that the values range is from 1.2 to 1.33.

Improvements for next time? To improve this logistic regression model, I would try to gather more data so that variable gender could enter the given model. Mainly because gender as a factor for purchasing a car could be important from the commons sense point of view. To fit this model more in the EU environment I would like to use the euro as currency and also add a variable called Urbanization to see if the size of the city matters for purchasing a SUV. And last but not least I would add variables Family and Number of children. Because SUV is, in my opinion, that type of car, which would be more purchased by families. I am also not sure if made the computations correctly due to the high numbers that came about as a result. However, i used this formula in excel: and the result is consistent with the Probability curve.

Advanced data analysis, Papers of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Advanced data analysis and more Papers Data Analysis & Statistical Methods in PDF only on Docsity!