Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Applied Statistics using Python MCQ and 2 marks, Quizzes of Applied Statistics

PSG College of Technology Applied Statistics

Struggling with Applied Statistics using Python? This comprehensive study material is designed specifically for students who want clear, exam-ready preparation without wasting time on unnecessary theory. This document includes a carefully curated collection of multiple-choice questions (MCQs) and concise 2-mark answers covering all essential topics in Applied Statistics using Python. Each concept is explained in a simple and practical way, making it easier to understand statistical methods and their implementation using Python. Topics Covered: * Descriptive Statistics (Mean, Median, Mode, Variance, Standard Deviation) * Probability Distributions * Hypothesis Testing * Correlation & Regression * ANOVA * Python Libraries (NumPy, Pandas, Matplotlib, SciPy) * Data Analysis Techniques What You Get: * High-quality MCQs for quick practice * **Perfect for last-minute revision, quick practice, and understanding key concepts without confusion.**

Typology: Quizzes

2025/2026

Uploaded on 04/01/2026

ash-de-cruze 🇮🇳

5 documents

1 / 30

This page cannot be seen from the preview

Don't miss anything!

Page 1

ASP LAB

Applied Statistical Programming

Complete Learning Guide

Experiments 1 – 9 | Python | NumPy | SciPy | Scikit-learn

Beginner-Friendly • Dark Mode • Viva Q&A Included

Discover Quizzes of Applied Statistics PSG College of Technology

Partial preview of the text

Download Applied Statistics using Python MCQ and 2 marks and more Quizzes Applied Statistics in PDF only on Docsity!

ASP LAB

Applied Statistical Programming

Complete Learning Guide

Experiments 1 – 9 | Python | NumPy | SciPy | Scikit-learn

Beginner-Friendly • Dark Mode • Viva Q&A Included

Table of Contents

Experiment 1 Matrix Operations using NumPy

Experiment 2 Mean, Variance and Standard Deviation

Experiment 3 Data Visualization – Univariate & Bivariate

Experiment 4 Correlation and Regression Analysis

Experiment 5 Hypothesis Testing – t-Test & Chi-Square

Experiment 6 Analysis of Variance (ANOVA)

Experiment 7 Principal Component Analysis (PCA)

Experiment 8 K-Means Clustering

Experiment 9 Time-Series Trend Analysis

B = np.array([[9, 8, 7], [6, 5, 4], [3, 2, 1]]) print('Matrix A:', A) print('Matrix B:', B)

Basic operations

print('Addition (A+B):', A + B) print('Subtraction (A-B):', A - B) print('Element-wise Multiplication:', A * B) print('Element-wise Division:', A / B) print('Matrix Multiplication (Dot):', np.dot(A, B)) print('Transpose of A:', A.T)

Statistics

print('Mean of A:', np.mean(A)) print('Median of A:', np.median(A)) print('Std Dev of A:', np.std(A)) print('Variance of A:', np.var(A)) print('Row-wise Mean:', np.mean(A, axis=1)) print('Column-wise Mean:', np.mean(A, axis=0)) print('Max:', np.max(A), ' Min:', np.min(A))

Normalization

norm_A = (A - np.min(A)) / (np.max(A) - np.min(A)) print('Normalized A:', norm_A)

Linear Algebra

det = np.linalg.det(A) print('Determinant:', det) if det != 0: print('Inverse:', np.linalg.inv(A)) else: print('Singular matrix – no inverse') eigenvalues, eigenvectors = np.linalg.eig(A) print('Eigenvalues:', eigenvalues) print('Eigenvectors:', eigenvectors)

Step-by-Step Procedure

Step 1: Open your Python IDE (Spyder/Jupyter).
Step 2: Import numpy as np at the top of your file.
Step 3: Create Matrix A and Matrix B using np.array([[...]]).
Step 4: Perform addition (A+B), subtraction (A-B), element-wise multiplication (A*B) and division

(A/B).

Step 5: Use np.dot(A, B) for true matrix multiplication.
Step 6: Use A.T to find the transpose.
Step 7: Compute statistical values: np.mean(), np.median(), np.std(), np.var().
Step 8: Normalize the matrix using the min-max formula.
Step 9: Compute the determinant with np.linalg.det() and inverse with np.linalg.inv().

Step 10: Compute eigenvalues and eigenvectors with np.linalg.eig().
Step 11: Run the program and verify the output.

Observations

Operation Result Shape Key Function Addition A+B 3×3 A + B Subtraction A-B 3×3 A - B Element-wise Mult 3×3 A * B Matrix Mult 3×3 np.dot(A,B) Transpose 3×3 A.T Mean Scalar np.mean(A) Determinant Scalar np.linalg.det(A)

Result

The program successfully performs all matrix operations including addition, subtraction, element-wise and

dot-product multiplication, transpose, statistical calculations, normalization, determinant, inverse, and

eigenvalue decomposition using NumPy.

Conclusion

NumPy provides extremely efficient and concise tools for matrix operations that would otherwise require

many lines of manual code. These operations form the foundation of machine learning, image processing,

and engineering data analysis.

Real-World Applications

Image processing (each image is a matrix of pixel values)
Machine learning (weight matrices in neural networks)
Computer graphics (transformation matrices for 3D rendering)
Structural engineering (finite element analysis)
Google PageRank algorithm (eigenvalue computation)

Viva Questions & Answers

Q: What is NumPy and why is it used?

A: NumPy (Numerical Python) is a library for numerical computing. It provides the ndarray object which

supports fast vectorized operations, making it much faster than Python lists for mathematical computations.

Q: Difference between a NumPy array and a Python list?

A: Python lists can hold mixed types and are slow for math. NumPy arrays hold a single data type, are stored

in contiguous memory, and support vectorized operations making them 50–100x faster.

Experiment 2 – Mean, Variance and Standard

Deviation

Aim

To write a Python program to compute mean, variance, and standard deviation of user-entered data using

NumPy.

Theory

These three measures are the most fundamental tools in descriptive statistics. They help summarize an

entire dataset with just a few numbers.

Mean (Average)

The mean is the sum of all values divided by the number of values. It represents the central value of the

dataset.

Mean ( μ ) = (x1 + x2 + ... + xn) / n

Variance

Variance measures how spread out the data is from the mean. A high variance means data points are

widely scattered; a low variance means they are clustered near the mean.

Variance ( σ ²) = Σ (xi - μ )² / n

Standard Deviation

Standard deviation is the square root of variance. It is in the same unit as the original data, making it more

interpretable.

Std Dev ( σ ) = √ Variance = √ ( Σ (xi - μ )² / n)

n Note: Standard deviation can NEVER be negative. It is always ≥ 0.

n Tip: If all values are identical, variance and std dev are both 0.

Materials Required

Python 3.x
NumPy library

Program

import numpy as np

Get number of elements from user

n = int(input('Enter number of elements: '))

Collect data values

data = [] for i in range(n):

value = float(input(f'Enter element {i+1}: ')) data.append(value)

Convert to NumPy array

data = np.array(data)

Compute statistics

mean = np.mean(data) variance = np.var(data) std_dev = np.std(data)

Display results

print('\nData:', data) print('Mean:', mean) print('Variance:', variance) print('Standard Deviation:', std_dev)

Step-by-Step Procedure

Step 1: Import NumPy.
Step 2: Ask the user how many elements they want to enter.
Step 3: Loop n times and collect each value into a list.
Step 4: Convert the list to a NumPy array using np.array().
Step 5: Call np.mean(), np.var(), np.std() on the array.
Step 6: Print all results clearly.

Sample Output

Enter number of elements: 5 Enter element 1: 10 Enter element 2: 20 Enter element 3: 30 Enter element 4: 40 Enter element 5: 50 Data: [10. 20. 30. 40. 50.] Mean: 30. Variance: 200. Standard Deviation: 14.

Observations

Input Data Mean Variance Std Dev 10, 20, 30, 40, 50 30.0 200.0 14. 5, 5, 5, 5, 5 5.0 0.0 0. 1, 100 50.5 2450.25 49.

Result

Experiment 3 – Data Visualization (Univariate &

Bivariate)

Aim

To write a Python program for data visualization for univariate and bivariate datasets using Matplotlib.

Theory

Data Visualization converts raw numbers into graphical representations, making patterns, trends, and

outliers instantly visible.

Univariate Data

Univariate data has only ONE variable (e.g., heights of students). We visualize its distribution using

histograms, box plots, or bar charts.

Bivariate Data

Bivariate data has TWO variables (e.g., study hours vs marks). We look at the relationship between them

using scatter plots and line plots.

Plot Types

Histogram — shows frequency distribution of a single variable
Scatter Plot — shows relationship between two variables as dots
Line Plot — shows trend over ordered data (e.g., time)

n Note: Always label your axes and give a title. A plot without labels is incomplete.

Materials Required

Python 3.x
NumPy — for data arrays
Matplotlib — for plotting (pip install matplotlib)

Program

import numpy as np import matplotlib.pyplot as plt

--- Univariate Dataset ---

data = np.array([10, 15, 20, 25, 30, 35, 40])

Histogram

plt.figure() plt.hist(data, bins=5, color='steelblue', edgecolor='white') plt.title('Univariate Data - Histogram') plt.xlabel('Values') plt.ylabel('Frequency') plt.show()

--- Bivariate Dataset ---

x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 4, 6, 8, 10])

Scatter Plot

plt.figure() plt.scatter(x, y, color='orange') plt.title('Bivariate Data - Scatter Plot') plt.xlabel('X values') plt.ylabel('Y values') plt.show()

Line Plot

plt.figure() plt.plot(x, y, color='green', marker='o') plt.title('Bivariate Data - Line Plot') plt.xlabel('X values') plt.ylabel('Y values') plt.show()

Step-by-Step Procedure

Step 1: Import numpy and matplotlib.pyplot.
Step 2: Create univariate dataset as a NumPy array.
Step 3: Plot histogram using plt.hist() with appropriate bins.
Step 4: Add title and axis labels; show the plot.
Step 5: Create bivariate dataset (x and y arrays).
Step 6: Plot scatter chart using plt.scatter().
Step 7: Plot line chart using plt.plot().
Step 8: Add titles and labels to all plots.

Observations

Plot Type Data Type Function Used Purpose Histogram Univariate plt.hist() Show frequency distribution Scatter Bivariate plt.scatter() Show relationship between variables Line Bivariate plt.plot() Show trends over ordered data

Result

The program successfully visualizes univariate data using a histogram and bivariate data using scatter and

line plots.

Conclusion

Experiment 4 – Correlation and Regression

Analysis

Aim

To write a program to calculate correlation coefficient, generate a regression equation, and graph to

visualize and predict data trends.

Theory

Correlation

Correlation measures the strength and direction of the linear relationship between two variables. The

Pearson Correlation Coefficient (r) ranges from -1 to +1.

Value of r Meaning +1.0 Perfect positive correlation +0.7 to +0.9 Strong positive +0.4 to +0.6 Moderate positive 0 No correlation -0.7 to -0.9 Strong negative -1.0 Perfect negative correlation

Linear Regression

Regression finds the best-fit straight line through the data points. The line equation is y = mx + c where m

is slope and c is intercept.

Regression Line: y = mx + c

Slope: m = Σ (xi - x n )(yi - n ) / Σ (xi - x n )²

n Note: Correlation tells HOW STRONG the relationship is. Regression tells the EXACT EQUATION to

predict y from x.

Program

import numpy as np import matplotlib.pyplot as plt

Sample data

x = np.array([10, 20, 30, 40, 50]) y = np.array([15, 25, 35, 45, 55])

Correlation

correlation = np.corrcoef(x, y) print('Correlation Matrix:\n', correlation)

Regression coefficients y = mx + c

m, c = np.polyfit(x, y, 1) print(f'\nRegression Equation: y = {m:.2f}x + {c:.2f}')

Predicted values

y_pred = m * x + c

Plot

plt.figure() plt.scatter(x, y, color='orange', label='Actual Data') plt.plot(x, y_pred, color='cyan', label='Regression Line') plt.title('Regression Analysis') plt.xlabel('X values') plt.ylabel('Y values') plt.legend() plt.show()

Sample Output Correlation Matrix: [[1. 1.] [1. 1.]] Regression Equation: y = 1.00x + 5.

Observations X Values Y Values Correlation (r) Slope (m) Intercept (c) 10,20,30,40,50 15,25,35,45,55 1.0 1.0 5.

Result

The program calculates the correlation coefficient and regression equation and plots the best-fit line

through the data points.

Viva Questions & Answers

Q: What is correlation?

A: A statistical measure that expresses the extent to which two variables are linearly related. Ranges from -

to +1.

Q: What is Pearson correlation coefficient?

A: A number between -1 and +1 indicating the strength of linear relationship. +1 = perfect positive, -1 =

perfect negative, 0 = no relationship.

Q: What is regression?

A: A statistical method that models the relationship between variables and finds the best-fit line to predict one

variable from another.

Q: Difference between correlation and regression?

Experiment 5 – Hypothesis Testing (t-Test &

Chi-Square)

Aim

To conduct hypothesis testing using the T-Test (for numerical data) and the Chi-Square Test (for

categorical data) using Python.

Theory

Hypothesis Testing is a statistical method to make decisions about populations based on sample data.

We test two competing claims:

Null Hypothesis (H n ) : No significant difference exists (default assumption)
Alternative Hypothesis (H n ) : A significant difference does exist

We use the p-value to decide. If p-value < 0.05 (significance level α), we reject Hn.

t-Test

Used to compare the means of two groups. If the p-value is below 0.05, the groups are significantly

different.

t = (x nn - x nn ) / √ (s²/n n + s²/n n )

Chi-Square Test

Used for categorical data to check if two variables are independent. We compare observed frequencies

with expected frequencies.

χ ² = Σ (Observed - Expected)² / Expected

n Note: p-value < 0.05 → Reject H n (significant difference exists)

n Note: p-value ≥ 0.05 → Fail to reject H n (no significant difference)

Program — t-Test

from scipy import stats n = int(input('Enter number of elements in each group: ')) group1 = [float(input(f'Group1 value {i+1}: ')) for i in range(n)] group2 = [float(input(f'Group2 value {i+1}: ')) for i in range(n)] t_stat, p_value = stats.ttest_ind(group1, group2) print('T-statistic:', t_stat) print('P-value:', p_value) if p_value < 0.05: print('Reject H0: Significant difference exists.') else: print('Fail to reject H0: No significant difference.')

Program — Chi-Square Test

import numpy as np from scipy.stats import chi2_contingency rows = int(input('Enter number of rows: ')) cols = int(input('Enter number of columns: ')) observed = [] for i in range(rows): row = [int(input(f'Value at ({i+1},{j+1}): ')) for j in range(cols)] observed.append(row) observed_array = np.array(observed) chi2, p, dof, expected = chi2_contingency(observed_array) print('Chi-Square Value:', chi2) print('P-value:', p) print('Degrees of Freedom:', dof) print('Expected Frequencies:\n', expected) if p < 0.05: print('Reject H0: Variables are dependent.') else: print('Fail to reject H0: Variables are independent.')

Observations

Test Statistic p-value Decision t-Test -0.54 0.60 Fail to reject Hn Chi-Square 7.64 0.105 Fail to reject Hn

Result

Hypothesis testing was successfully conducted using Python for both the T-Test and Chi-Square Test with

correct interpretation of p-values.

Viva Questions & Answers

Q: What is hypothesis testing?

A: A statistical procedure to test claims about a population using sample data. We decide whether to reject

the null hypothesis based on the p-value.

Q: What is a null hypothesis?

A: Hn is the default claim — typically 'no difference exists'. We only reject it if the data provides strong

evidence against it.

Q: What is a p-value?

A: The probability of getting results as extreme as observed, assuming Hn is true. A small p-value (< 0.05)

means the result is unlikely under Hn.

Q: What is a t-test?

Experiment 6 – Analysis of Variance (ANOVA)

Aim

To perform Analysis of Variance (ANOVA) using Python to compare multiple datasets and determine

whether a statistically significant difference exists among their means.

Theory

ANOVA (Analysis of Variance) extends the t-test to compare three or more groups simultaneously.

Instead of performing multiple t-tests (which inflates error), ANOVA does it in one go.

How ANOVA Works

It computes the F-statistic = Between-group variance / Within-group variance
A large F-statistic suggests means are significantly different
p-value < 0.05 means at least one group mean is different

F = (Between-Group Variance) / (Within-Group Variance)

Types of ANOVA

One-Way ANOVA — one independent variable
Two-Way ANOVA — two independent variables

n Note: ANOVA tells you THAT a difference exists, not WHICH groups differ. Use post-hoc tests (Tukey

HSD) for that.

Program

import numpy as np from scipy import stats

Sample datasets (scores from three groups)

group1 = np.array([85, 88, 90, 92, 86]) group2 = np.array([78, 80, 82, 79, 81]) group3 = np.array([90, 92, 94, 91, 93])

One-Way ANOVA

f_statistic, p_value = stats.f_oneway(group1, group2, group3) print('Group 1:', group1) print('Group 2:', group2) print('Group 3:', group3) print('\nF-Statistic:', f_statistic) print('P-Value:', p_value) if p_value < 0.05: print('\nResult: Significant difference exists between groups.') else: print('\nResult: No significant difference between groups.')

Sample Output

F-Statistic: 42. P-Value: 3.479e- Result: Significant difference exists between the group means.

Observations

Group Values Mean Group 1 85, 88, 90, 92, 86 88. Group 2 78, 80, 82, 79, 81 80. Group 3 90, 92, 94, 91, 93 92.

Result

The program successfully computes the F-statistic and p-value. Since p-value ≈ 3.48×10nn < 0.05, we

conclude that a significant difference exists between the group means.

Viva Questions & Answers

Applied Statistics using Python MCQ and 2 marks, Quizzes of Applied Statistics

Related documents

Partial preview of the text

Download Applied Statistics using Python MCQ and 2 marks and more Quizzes Applied Statistics in PDF only on Docsity!

ASP LAB

Applied Statistical Programming

Complete Learning Guide

Experiments 1 – 9 | Python | NumPy | SciPy | Scikit-learn

Beginner-Friendly • Dark Mode • Viva Q&A Included

Experiment 1 Matrix Operations using NumPy

Experiment 2 Mean, Variance and Standard Deviation

Experiment 3 Data Visualization – Univariate & Bivariate

Experiment 4 Correlation and Regression Analysis

Experiment 5 Hypothesis Testing – t-Test & Chi-Square

Experiment 6 Analysis of Variance (ANOVA)

Experiment 7 Principal Component Analysis (PCA)

Experiment 8 K-Means Clustering

Experiment 9 Time-Series Trend Analysis

Basic operations

Statistics

Normalization

Linear Algebra

(A/B).

The program successfully performs all matrix operations including addition, subtraction, element-wise and

dot-product multiplication, transpose, statistical calculations, normalization, determinant, inverse, and

eigenvalue decomposition using NumPy.

NumPy provides extremely efficient and concise tools for matrix operations that would otherwise require

many lines of manual code. These operations form the foundation of machine learning, image processing,

and engineering data analysis.

Q: What is NumPy and why is it used?

A: NumPy (Numerical Python) is a library for numerical computing. It provides the ndarray object which

supports fast vectorized operations, making it much faster than Python lists for mathematical computations.

Q: Difference between a NumPy array and a Python list?

A: Python lists can hold mixed types and are slow for math. NumPy arrays hold a single data type, are stored

in contiguous memory, and support vectorized operations making them 50–100x faster.

To write a Python program to compute mean, variance, and standard deviation of user-entered data using

NumPy.

These three measures are the most fundamental tools in descriptive statistics. They help summarize an

entire dataset with just a few numbers.

The mean is the sum of all values divided by the number of values. It represents the central value of the

dataset.

Mean ( μ ) = (x1 + x2 + ... + xn) / n

Variance measures how spread out the data is from the mean. A high variance means data points are

widely scattered; a low variance means they are clustered near the mean.

Variance ( σ ²) = Σ (xi - μ )² / n

Standard deviation is the square root of variance. It is in the same unit as the original data, making it more

interpretable.

Std Dev ( σ ) = √ Variance = √ ( Σ (xi - μ )² / n)

n Note: Standard deviation can NEVER be negative. It is always ≥ 0.

n Tip: If all values are identical, variance and std dev are both 0.

Get number of elements from user

Collect data values

Convert to NumPy array

Compute statistics

Display results

To write a Python program for data visualization for univariate and bivariate datasets using Matplotlib.

Data Visualization converts raw numbers into graphical representations, making patterns, trends, and

outliers instantly visible.

Univariate data has only ONE variable (e.g., heights of students). We visualize its distribution using

histograms, box plots, or bar charts.

Bivariate data has TWO variables (e.g., study hours vs marks). We look at the relationship between them

using scatter plots and line plots.

n Note: Always label your axes and give a title. A plot without labels is incomplete.

--- Univariate Dataset ---

Histogram

--- Bivariate Dataset ---

Scatter Plot

Line Plot

The program successfully visualizes univariate data using a histogram and bivariate data using scatter and

line plots.

To write a program to calculate correlation coefficient, generate a regression equation, and graph to

visualize and predict data trends.

Correlation measures the strength and direction of the linear relationship between two variables. The

Pearson Correlation Coefficient (r) ranges from -1 to +1.

Regression finds the best-fit straight line through the data points. The line equation is y = mx + c where m

is slope and c is intercept.

Regression Line: y = mx + c

Slope: m = Σ (xi - x n )(yi - n ) / Σ (xi - x n )²

n Note: Correlation tells HOW STRONG the relationship is. Regression tells the EXACT EQUATION to

predict y from x.