Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistics Summary notes, Schemes and Mind Maps of Statistics

Parul University Statistics

Summary notes for Statistics and a reference guide.

Typology: Schemes and Mind Maps

2025/2026

Available from 06/24/2026

akshat-srivastava-6 🇮🇳

2 documents

1 / 108

This page cannot be seen from the preview

Don't miss anything!

3/04/25, 7:38 PM Statistics ( Dipankar Raman Notes) - Jupyter Notebook

What is Statistics

Statistics is a branch of mathematics that involves collecting, analysing, interpreting, and

presenting data. It provides tools and methods to understand and make sense of large amounts

of data and to draw conclusions and make decisions based on the data.

Types of Statistics

Descriptive

Inferential

Descriptive Statistics :Descriptive statisticsdeals with the collection, organization, analysis,

interpretation, and presentation of data. It focuses on summarizing and describing the main

features of a set of data, without making inferences or predictions about the larger population

Inferential statistics: Inferential statistics deals with making conclusions and predictions about

a population based on a sample. It involves the use of probability theory to estimatethe

likelihood of certain events occurring, hypothesis testing to determine if a certain claim about a

population is supported by the data, and regression analysis to examine the relationships

between variables

Population Vs Sample

Population :Population refers to the entire group of individuals or objects that we are interested

in studying. It is the complete set of observations that we want to make inferences about. For

example, the population might be all the students in a particular school or all the cars in a

particular city.

Sample : A sample, on the other hand, is a subset of the population. It is a smaller group of

individuals or objects that we select from the population to study. Samples are used to estimate

characteristics of the population, such as the mean or the proportionwith a certain attribute. For

example, we might randomly select 100 students.

localhost:8888/notebooks/Statistics ( Dipankar Raman Notes).ipynb 1/108

Discover Schemes and Mind Maps of Statistics Parul University

Partial preview of the text

Download Statistics Summary notes and more Schemes and Mind Maps Statistics in PDF only on Docsity!

What is Statistics

Statistics is a branch of mathematics that involves collecting, analysing, interpreting, and presenting data. It provides tools and methods to understand and make sense of large amounts of data and to draw conclusions and make decisions based on the data.

Types of Statistics

Descriptive Inferential Descriptive Statistics :Descriptive statistics deals with the collection, organization, analysis, interpretation, and presentation of data. It focuses on summarizing and describing the main features of a set of data, without making inferences or predictions about the larger population Inferential statistics : Inferential statistics deals with making conclusions and predictions about a population based on a sample. It involves the use of probability theory to estimate the likelihood of certain events occurring, hypothesis testing to determine if a certain claim about a population is supported by the data, and regression analysis to examine the relationships between variables

Population Vs Sample

Population :Population refers to the entire group of individuals or objects that we are interested in studying. It is the complete set of observations that we want to make inferences about. For example, the population might be all the students in a particular school or all the cars in a particular city. Sample : A sample, on the other hand, is a subset of the population. It is a smaller group of individuals or objects that we select from the population to study. Samples are used to estimate characteristics of the population, such as the mean or the proportion with a certain attribute. For example, we might randomly select 100 students.

What is Descriptive Stastistics

Measure of Central Tendency A measure of central tendency is a statistical measure that represents a typical or central value for a dataset. It provides a summary of the data by identifying a single value that is most representative of the dataset as a whole. Mean : The mean is the sum of all values in the dataset divided by the number of values

Measure of Dispersion : A measure of dispersion is a statistical measure that describes the spread or variability of a dataset. It provides information about how the data is distributed around the central tendency (mean, median or mode) of the dataset. Range : The range is the difference between the maximum and minimum values in the dataset. It is a simple measure of dispersion that is easy to calculate but can be affected by outliers. Variance :The variance is the average of the squared differences between each data point and the mean. It measures the average distance of each data point from the mean and is useful in comparing the dispersion of datasets with different means.

Standard Deviation :The standard deviation is the square root of the variance. It is a widely used measure of dispersion that is useful in describing the shape of a distribution Coefficient of Variation (CV) : The CV is the ratio of the standard deviation to the mean expressed as a percentage. It is used to compare the variability of datasets with different means and is commonly used in fields such as biology, chemistry, and engineering. The coefficient of variation (CV) is a statistical measure that expresses the amount of variability in a dataset relative to the mean. It is a dimensionless quantity that is expressed as a percentage. The formula for calculating the coefficient of variation is: CV = (standard deviation / mean) x 100 %

Quantiles and Percentiles

Quantiles are statistical measures used to divide a set of numerical data into equal-sized groups, with each group containing an equal number of observations. Quantiles are important measures of variability and can be used to: understand distribution of data, summarize and compare different datasets. They can also be used to identify outliers. There are several types of quantiles used in statistical analysis, including: Quartiles : Divide the data into four equal parts, Q 1 ( 25 th percentile), Q 2 ( 50 th percentile or median), and Q 3 ( 75 th percentile). Deciles : Divide the data into ten equal parts, D 1 ( 10 th percentile), D 2 ( 20 th percentile), ..., D 9 ( 90 th percentile).

interquartile range :The interquartile range (IQR) is a measure of variability that is based on the five-number summary of a dataset. Specifically, the IQR is defined as the difference between the third quartile (Q 3 ) and the first quartile (Q 1 ) of a dataset.

1. What is a boxplot

A box plot, also known as a box-and-whisker plot, is a graphical representation of a dataset that shows the distribution of the data. The box plot displays a summary of the data, including the minimum and maximum values, the first quartile (Q1), the median (Q2), and the third quartile (Q3).

Covariance

Covariance measures the direction of the relationship between two variables. A positive covariance means that both variables tend to be high or low at the same time. A negative covariance means that when one variable is high, the other tends to be low

In [1]: In [ 2 ]: In [3]: In [ 4 ]: import pandas as pd import random import seaborn as sns import pandas as pd import matplotlib.pyplot as plt import numpy as np df = pd.DataFrame() from re import X x = pd.Series([ 12 , 25 , 68 , 42 , 113 ]) y = pd.Series([ 11 , 29 , 58 , 121 , 100 ]) df['x'] = x df['y'] = y

Correlation

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It's a common tool for describing simple relationships without making a statement about cause and effect.

In [8]: In [ 9 ]:

Probability distribution

A probability distribution is an idealized frequency distribution. A frequency distribution describes a specific sample or dataset. It's the number of times each possible value of a variable occurs in the dataset. The number of times a value occurs in a sample is determined by its probability of occurrence. Out[ 8 ]: (^) Text(0.5, 1.0, 'Correlation - 0.6185423626205997') Out[ 9 ]: (^) Text(0.5, 1.0, 'Correlation - 0.6185423626205997') fig, (ax 1 , ax 2 ) = plt.subplots( 1 , 2 , figsize = ( 10 , 3 )) # Plot scatterplots on each axes ax 1 .scatter(df['x'], df['x']) ax 2 .scatter(df['x'], df['y'],color = 'orange') ax 1 .set_title("Correlation - " + str(df['x'].corr(df['x']))) ax 2 .set_title("Correlation - " + str((df['x']).corr(df['y']))) fig, (ax1, ax2) = plt.subplots( 1 , 2 , figsize = ( 10 , 3 )) # Plot scatterplots on each axes ax1.scatter(df['x'], df['y'] , color = 'red') ax2.scatter(df['x'] ***** 2 , df['y']) ax1.set_title("Correlation - " + str(df['x'].corr(df['y']))) ax2.set_title("Correlation - " + str((df['x'] ***** 2 ).corr(df['y'] ***** 2 )))

1. Probability Mass Function (PMF)

PMF stands for Probability Mass Function. It is a mathematical function that describes the probability distribution of categorical a discrete random variable. examples of discrete data Nominal (e.g., gender, ethnic background, religious or political affiliation) Ordinal (e.g., extent of agreement, school letter grades) Quantitative variables with relatively few values (e.g., number of times married) The PMF of a discrete random variable assigns a probability to each possible value of the random variable. The probabilities assigned by the PMF must satisfy two conditions: a.The probability assigned to each value must be non-negative (i.e., greater than or equal to zero). b. The sum of the probabilities assigned to all possible values must equal 1. In [ 10 ]: In [ 11 ]: In [ 12 ]: In [13]: Out[ 12 ]: (^10000) Out[ 13 ]: (^) [2, 4, 5, 1, 6] import pandas as pd import random l = [] for i in range( 10000 ): l.append(random.randint( 1 , 6 )) # Here we take values from 1 - 6 len(l) l[: 5 ]

In [14]: In [15]: In [16]: In [17]: In [18]: Out[ 14 ]: (^2 ) 5 1682 3 1677 4 1670 6 1657 1 1586 dtype: int Out[15]: (^2) 0. 5 0. 3 0. 4 0. 6 0. 1 0. dtype: float Out[17]: (^) pd.Series(l).value_counts() pd.Series(l).value_counts() / pd.Series(l).value_counts().sum() s = pd.Series(l).value_counts() / pd.Series(l).value_counts().sum() s.plot(kind = 'bar') # Now rollig two dices and Add the both a and b l = [] for i in range( 10000 ): a = (random.randint( 1 , 6 )) b = (random.randint( 1 , 6 )) l.append(a + b)

In [24]: In [25]:

Cumulative Distribution Function(CDF) of PMF

The cumulative distribution function (CDF) F(x) describes the probability that a random variable X with a given probability distribution will be found at a value less than or equal to x F(x) = P(X <= x) In [26]: In [27]: Out[25]: (^) Out[27]: (^2) 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 0. 10 0. 11 0. 12 1. dtype: float s = (pd.Series(l).value_counts() / pd.Series(l).value_counts().sum()).sort_index s.plot(kind = 'bar') import numpy as np np.cumsum(s) # Total = 1

In [28]: What is the difference between PMF AND CDF PMF : Gives the probability of the particular point (X) CDF : Gives the probability of the All the points up to (x)

Probability Density Function (PDF)

PDF stands for Probability Density Function. It is a mathematical function that describes the probability distribution of a continuous random variable. Out[28]: np.cumsum(s).plot(kind = 'bar')

In [30]: In [31]: In [32]: In [33]: Out[30]: (^) array([61.23026396, 48.88218776, 53.37395358, 48.23859345, 38.82775714, 50.71547702, 43.55170549, 59.59037734, 44.07646131, 51.2291823 , 49.50129918, 41.87482911, 42.89516605, 49.83430018, 50.42984119, 48.02929317, 43.66796728, 47.26094984, 48.14397298, 45.30814864, 51.62220736, 44.8594848 , 41.84895438, 47.33320696, 53.85271785, 41.98071327, 51.7260448 , 52.96282035, 50.05249076, 50.19805956, 49.07434682, 53.07117544, 53.86269107, 56.04279745, 40.90454341, 51.77673487, 43.2966207 , 51.75137924, 40.90094009, 54.19609498, 55.26453117, 55.4609019 , 54.4989116 , 54.82385172, 43.10395288, 41.8218877 , 46.58957669, 51.41111365, 52.52098661, 49.19831397, 48.99275682, 62.61191008, 43.65576003, 55.21987601, 44.79056897, 51.78870203, 46.37564734, 51.40565093, 45.28410534, 45.75391457, 49.09427446, 53.86831982, 52.71599378, 44.36834434, 50.74665809, 53.72739067, 49.49842696, 47.32866333, 43.56592226, 46.80359987, 49.25437035, 56.58358384, 60.41122227, 48.2555872 , 52.34446159, 58.39865208, 49.96656899, 60.76236236, 47.51901699, 44.65567849, 49.36043701, 43.655939 , 56.18308073, 47.10868464, 47.8524103 , 44.90322859, 48.05274478, 46.16966608, 49.40622809, 39.38872195, 53.57422306, 50.30220825, 50.92156859, 45.35959403, 44.0693628 , Out[31]: (^) 50. Out[32]: (^) 4. Out[33]: (^) (array([ 3., 28., 92., 180., 282., 205., 139., 52., 15., 4.]), array([35.20388838, 38.31826811, 41.43264784, 44.54702757, 47.6614073 , 50.77578703, 53.89016676, 57.0045465 , 60.11892623, 63.23330596, 66.34768569]), ) sample sample.mean() # average sample.std() _# scale -- standard deviation

plot histogram to understand the distribution of data_

plt.hist(sample,bins = 10 )

In [34]: In [35]: In [36]: In [37]: In [38]: In [39]: In [40]: Out[37]: (^) 66. Out[38]: (^) 35. Out[40]: (^) [] # calculate sample mean and sample std dev sample_mean = sample.mean() sample_std = sample.std() # fit the distribution with the above parameters from scipy.stats import norm dist = norm(sample_mean, sample_std) values = np.linspace(sample.min(),sample.max(), 100 ) sample.max() sample.min() probability_density = [dist.pdf(value) for value in values] # plot the histogram and pdf plt.hist(sample,bins = 10 ,density =True ) plt.plot(values,probability_density)

Statistics Summary notes, Schemes and Mind Maps of Statistics

Related documents

Partial preview of the text

Download Statistics Summary notes and more Schemes and Mind Maps Statistics in PDF only on Docsity!

What is Statistics

Types of Statistics

Population Vs Sample

What is Descriptive Stastistics

Quantiles and Percentiles

1. What is a boxplot

Covariance

Correlation

Probability distribution

1. Probability Mass Function (PMF)

Cumulative Distribution Function(CDF) of PMF

Probability Density Function (PDF)

plot histogram to understand the distribution of data_