Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

VIF analysis using Python, Exercises of Data Analysis & Statistical Methods

Academy of Art University Data Analysis & Statistical Methods

To perform data analysis using python on a qualitative data

Typology: Exercises

2019/2020

Uploaded on 09/02/2020

vikram_jaswal 🇺🇸

1 document

1 / 11

This page cannot be seen from the preview

Don't miss anything!

ASSIGNMENT NO: 10

Submitted To: Dr. RAMPRRASADH GOARTHY

Group 2

PGDM 2019-2021

Name Roll No.

Rahul Patibandla A032

Vikram Jaswal B054

Parag Jamdade C030

Discover Exercises of Data Analysis & Statistical Methods Academy of Art University

Partial preview of the text

Download VIF analysis using Python and more Exercises Data Analysis & Statistical Methods in PDF only on Docsity!

ASSIGNMENT NO: 10

Submitted To: Dr. RAMPRRASADH GOARTHY

Group 2

PGDM 2019-

Name Roll No.

Rahul Patibandla A

Vikram Jaswal B

Parag Jamdade C

Python Code: import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.linear_model import LogisticRegression from sklearn import metrics data= pd.read_csv('HeartDisease.csv') print(data.shape) print(data) print(data.describe()) data_top = data.head() Output:

From the graph below we see diagnosis of heart disease (angiographic disease status) around 140 people have less than 50% diameter narrowing rest have more than 50%. From the graph below Value 0: typical angina, 140 people Value 1: atypical angina, 50 people Value 2: non-anginal pain, 80 people Value 3: asymptomatic, 20 people

From the graph below 150 people have fasting blood sugar > 120 mg/dl From the graph below 150 people have exercise induced angina Code: To check multicollinearity X = data[['age', 'sex', 'cp', 'fbs','chol' ,'target','thalach','exang','thal','restecg','oldpeak','slope','ca']] y = data['trestbps']

Output: Code: Training and Test Sets: Splitting Data | Normalization of the Dataset X = np.asarray(data[['age', 'sex', 'cp', 'fbs', 'trestbps','chol','exang', 'thal','restecg', 'thalach','oldpeak','slope','ca']]) y = np.asarray(data['target']) from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.3, random_state = 4) print ('Train set:', X_train.shape, y_train.shape) print ('Test set:', X_test.shape, y_test.shape) Output: Code: Modeling of the Dataset | Evaluation and Accuracy from sklearn.linear_model import LogisticRegression logreg = LogisticRegression()

logreg.fit(X_train, y_train) y_pred = logreg.predict(X_test) from sklearn.metrics import jaccard_similarity_score print('Accuracy of the model in jaccard similarity score is = ', jaccard_similarity_score(y_test, y_pred)) Output: Code: Details of the confusion matrix along with the plot from sklearn.metrics import confusion_matrix, classification_report cm = confusion_matrix(y_test, y_pred) conf_matrix = pd.DataFrame(data = cm, columns = ['Predicted:0', 'Predicted:1'], index =['Actual:0', 'Actual:1']) plt.figure(figsize = (8, 5)) sns.heatmap(conf_matrix, annot = True, fmt = 'd', cmap = "Greens") plt.show() print('The details for confusion matrix is =') print (classification_report(y_test, y_pred)) Output: Below is the confusion matrix of test data (20% of data) Below output are the figures before removing the trestbp variable:

After removing trestbp: True Positive: 49 True Negative: 27 False Positive: 11 False Negative: 4 Code: Calculating Sensitivity and Specificity total= sum(sum(cm)) accuracy=(cm[0,0]+cm[1,1])/total print ('Accuracy : ', accuracy) sensitivity= cm[0,0]/(cm[0,0]+cm[0,1]) print('Sensitivity : ', sensitivity )

specificity1 = cm[1,1]/(cm[1,0]+cm[1,1]) print('Specificity : ', specificity1) Output: After removing trestbp: In this case we will consider the specificity of the data, as this is a screening test. We will not consider the sensitivity as high prevalence will automatically increase the True positives. Specificity will keep a check whether a patient is wrongly diagnosed or not. In the above case the specificity is .924 which means NPV is 92% therefore, 92% of the tests out of the total number of tests are genuinely negative.

VIF analysis using Python, Exercises of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download VIF analysis using Python and more Exercises Data Analysis & Statistical Methods in PDF only on Docsity!

ASSIGNMENT NO: 10

Submitted To: Dr. RAMPRRASADH GOARTHY

Group 2

PGDM 2019-

Name Roll No.

Rahul Patibandla A

Vikram Jaswal B

Parag Jamdade C