Probability and Statistics Practice Exam with Python, Exams of Technology

A practice exam for probability and statistics in data science using python. It includes multiple-choice questions covering topics such as continuous variables, sampling methods, measures of central tendency, variance, interquartile range, z-scores, skewness, empirical rule, outlier detection, contingency tables, histograms, probability rules, bayes' theorem, naive bayes classifier, expected value, random variables, and various distributions (bernoulli, binomial, poisson, normal, exponential). Each question is accompanied by a detailed explanation of the correct answer, making it a valuable resource for students and professionals preparing for certification or seeking to reinforce their understanding of statistical concepts and their implementation in python. The exam covers both theoretical knowledge and practical application using libraries like pandas and scipy.stats, providing a comprehensive review of essential topics in probability and statistics for data science.

Typology: Exams

2025/2026

Available from 12/20/2025

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 83

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Probability and Statistics in Data Science using Python
Certificate Practice Exam
**Question 1.** Which of the following best describes a quantitative continuous variable?
A) Number of students in a class
B) Temperature measured to two decimal places
C) Gender of respondents
D) Rating on a 5point Likert scale
Answer: B
Explanation: Continuous variables can take any value within an interval; temperature measured
precisely is continuous.
**Question 2.** In sampling, stratified sampling is primarily used to:
A) Reduce cost by clustering similar units
B) Ensure each subgroup is represented proportionally
C) Select units completely at random
D) Sample only the largest clusters
Answer: B
Explanation: Stratified sampling divides the population into strata and samples from each
proportionally, improving representation.
**Question 3.** The mean is most sensitive to:
A) Skewed distributions
B) Outliers
C) Sample size
D) Number of categories
Answer: B
Explanation: Extreme values (outliers) pull the arithmetic mean toward them more than median
or mode.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53

Partial preview of the text

Download Probability and Statistics Practice Exam with Python and more Exams Technology in PDF only on Docsity!

Certificate Practice Exam

Question 1. Which of the following best describes a quantitative continuous variable? A) Number of students in a class B) Temperature measured to two decimal places C) Gender of respondents D) Rating on a 5‑point Likert scale Answer: B Explanation: Continuous variables can take any value within an interval; temperature measured precisely is continuous. Question 2. In sampling, stratified sampling is primarily used to: A) Reduce cost by clustering similar units B) Ensure each subgroup is represented proportionally C) Select units completely at random D) Sample only the largest clusters Answer: B Explanation: Stratified sampling divides the population into strata and samples from each proportionally, improving representation. Question 3. The mean is most sensitive to: A) Skewed distributions B) Outliers C) Sample size D) Number of categories Answer: B Explanation: Extreme values (outliers) pull the arithmetic mean toward them more than median or mode.

Certificate Practice Exam

Question 4. Which pandas function returns the mode(s) of a Series? A) df.mean() B) df.median() C) df.mode() D) df.var() Answer: C Explanation: mode() returns the most frequent value(s) in the Series. Question 5. The variance of a sample differs from the population variance because: A) It uses N‑1 in the denominator B) It uses N in the denominator C) It squares the mean instead of the deviations D) It does not square the deviations Answer: A Explanation: Sample variance uses Bessel’s correction (N‑1) to produce an unbiased estimator. Question 6. The Interquartile Range (IQR) is calculated as: A) Q3 – Q B) Q1 – Q C) Median – Q D) Q3 – Median Answer: A Explanation: IQR measures the spread of the middle 50 % of data: third quartile minus first quartile.

Certificate Practice Exam

Question 10. Which plot is most useful for detecting outliers in a univariate numeric variable? A) Histogram B) Box plot C) Scatter plot D) Bar chart Answer: B Explanation: Box plots display whiskers and points beyond them, highlighting outliers. Question 11. A contingency table is used to: A) Summarize continuous data distribution B) Show joint frequencies of two categorical variables C) Plot time series data D) Compute correlation coefficients Answer: B Explanation: It cross‑tabs counts of categories from two variables. Question 12. In matplotlib, which function creates a histogram? A) plt.plot() B) plt.bar() C) plt.hist() D) plt.scatter() Answer: C Explanation: hist() bins data and draws the histogram. Question 13. The probability of the union of two mutually exclusive events A and B is:

Certificate Practice Exam

A) P(A) × P(B)

B) P(A) + P(B)

C) P(A) − P(B)

D) P(A) / P(B)

Answer: B Explanation: Mutually exclusive events cannot occur together, so P(A∪B)=P(A)+P(B). Question 14. If two events are independent, then P(A ∩ B) equals: A) P(A) + P(B) B) P(A) − P(B) C) P(A) × P(B) D) P(A) / P(B) Answer: C Explanation: Independence implies multiplication rule for intersection. Question 15. Bayes’ theorem updates the prior probability to obtain the posterior probability using: A) Likelihood only B) Prior only C) Likelihood and evidence D) Evidence only Answer: C Explanation: Posterior ∝ Prior × Likelihood; evidence (normalizing constant) ensures probabilities sum to 1.

Certificate Practice Exam

Question 19. The variance of a Bernoulli(p) random variable is: A) p B) p(1‑p) C) p² D) (1‑p)² Answer: B Explanation: Var = p(1‑p) for a binary outcome. Question 20. Which distribution models the number of successes in 10 independent trials with success probability 0.2? A) Poisson(λ=2) B) Binomial(n=10, p=0.2) C) Geometric(p=0.2) D) Negative Binomial(r=10, p=0.2) Answer: B Explanation: Fixed number of Bernoulli trials → Binomial. Question 21. The probability mass function (PMF) of a Poisson(λ) distribution at k=3 is: A) (e^{‑λ} λ³)/3! B) λ³/3! C) e^{‑λ} λ³ D) (λ³)/e^{‑λ} Answer: A Explanation: Poisson PMF: P(k)=e^{‑λ} λ^{k}/k!

Certificate Practice Exam

Question 22. Which scipy.stats function generates random variates from a normal distribution with mean 0 and std 1? A) norm.rvs(loc=0, scale=1) B) norm.pdf(loc=0, scale=1) C) norm.cdf(loc=0, scale=1) D) norm.ppf(loc=0, scale=1) Answer: A Explanation: rvs produces random samples; pdf, cdf, ppf compute densities, probabilities, quantiles. Question 23. The uniform distribution on interval [a, b] has variance: A) (b‑a)² / B) (b‑a) / C) (b‑a)² / D) (b‑a) / Answer: A Explanation: Variance of continuous uniform = (b‑a)²/12. Question 24. A standard normal variable Z has mean: A) 0 B) 1 C) μ D) σ Answer: A Explanation: By definition, standard normal N(0,1) has mean 0.

Certificate Practice Exam

Question 28. The standard error of the mean for a sample of size n with sample standard deviation s is: A) s / √n B) s × √n C) s / n D) s × n Answer: A Explanation: SE = s / sqrt(n). Question 29. In a 95 % confidence interval for a population mean, the margin of error is: A) t_{α/2, df} · SE B) z_{α/2} · SE C) Both A and B depending on known σ D) None of the above Answer: C Explanation: Use z when σ known, t when σ unknown. Question 30. The law of large numbers guarantees that as sample size increases: A) Sample variance equals population variance B) Sample mean converges to the population mean C) Sample median converges to the mode D) Sample distribution becomes uniform Answer: B Explanation: LLN states convergence of sample average to expected value. Question 31. The null hypothesis in a one‑sample t‑test typically states that:

Certificate Practice Exam

A) Sample mean equals a specified value B) Sample mean differs from a specified value C) Population variance is zero D) Data are not normally distributed Answer: A Explanation: H₀: μ = μ₀; test assesses evidence against this equality. Question 32. When the population standard deviation is known and n ≥ 30, which test is appropriate for a mean hypothesis? A) One‑sample Z‑test B) One‑sample t‑test C) Paired t‑test D) Wilcoxon signed‑rank test Answer: A Explanation: Known σ and large n justify Z‑test. Question 33. A two‑sample independent t‑test assumes: A) Equal variances only B) Independent samples and (often) equal variances C) Paired observations D) Categorical outcome Answer: B Explanation: Independence of groups; equal variances may be assumed (pooled) or not (Welch). Question 34. In ANOVA, the F‑statistic is the ratio of: A) Between‑group variance to within‑group variance

Certificate Practice Exam

C) Sample size is less than 30 D) Data are skewed Answer: A Explanation: Correlation involves division by product of standard deviations; zero variance leads to division by zero. Question 38. Spearman’s rank correlation is appropriate when: A) Both variables are nominal B) Relationship is monotonic but not necessarily linear C) Data are normally distributed D) Variables have equal variances Answer: B Explanation: It measures monotonic association using ranks. Question 39. In simple linear regression, the slope coefficient β₁ represents: A) Change in Y for a one‑unit change in X B) Intercept of the regression line C) Correlation between X and Y D) Standard error of estimate Answer: A Explanation: β₁ quantifies the expected change in response per unit increase in predictor. Question 40. The coefficient of determination (R²) indicates: A) Proportion of variance in Y explained by X B) Correlation between X and Y C) Standard error of the slope

Certificate Practice Exam

D) Probability of Type I error Answer: A Explanation: R² = 1 – (SS_res/SS_tot), measuring explained variance. Question 41. Multicollinearity in multiple regression inflates: A) Standard errors of coefficients B) R² value C) Intercept estimate D) Sample size Answer: A Explanation: Correlated predictors make coefficient estimates unstable, increasing SEs. Question 42. In logistic regression, the link function is: A) Identity B) Logit (log‑odds) C) Probit D) Exponential Answer: B Explanation: Logistic model uses logit link: log(p/(1‑p)) = β₀+β₁X. Question 43. The AUC (Area Under the ROC Curve) measures: A) Model’s ability to discriminate between classes B) Proportion of variance explained C) Calibration of predicted probabilities D) Number of predictors used

Certificate Practice Exam

D) Increase model complexity Answer: B Explanation: KNN relies on Euclidean distance; scaling puts features on comparable units. Question 47. In PCA, the principal components are ordered by: A) Increasing variance explained B) Decreasing variance explained C) Alphabetical order of variable names D) Random selection Answer: B Explanation: First component captures most variance, subsequent components capture decreasing amounts. Question 48. The silhouette score is used to evaluate: A) Regression model accuracy B) Clustering cohesion and separation C) Classification recall D) Time‑series stationarity Answer: B Explanation: Silhouette measures how similar an object is to its own cluster vs. other clusters. Question 49. A time series that shows a consistent upward trend but no seasonality is said to be: A) Stationary B) Seasonal C) Trend‑stationary

Certificate Practice Exam

D) Non‑stationary Answer: D Explanation: Presence of a deterministic trend violates stationarity. Question 50. In ARIMA(p,d,q), the parameter d represents: A) Number of autoregressive terms B) Number of differencing operations to achieve stationarity C) Number of moving‑average terms D) Seasonal period Answer: B Explanation: d is the order of integration (differences). Question 51. The p‑value of a hypothesis test is: A) Probability that H₀ is true B) Probability of observing data as extreme as observed, assuming H₀ is true C) Significance level α D) Probability of a Type II error Answer: B Explanation: p‑value quantifies evidence against H₀ under its truth. Question 52. Which of the following is a correct interpretation of a 99 % confidence interval? A) There is a 99 % probability that the true parameter lies inside the interval B) 99 % of future samples will produce intervals that contain the true parameter C) The interval will contain 99 % of the data points D) The interval width is 99 % of the sample mean

Certificate Practice Exam

Explanation: t‑test is used when σ is unknown; known σ would lead to Z‑test. Question 56. In a paired t‑test, the test statistic is computed on: A) Raw scores of each group B) Differences between paired observations C) Sum of squares of each group D) Ratio of variances Answer: B Explanation: Paired test analyzes the mean of the differences. Question 57. The chi‑square goodness‑of‑fit test compares observed frequencies to: A) Expected frequencies under a specified distribution B) Frequencies of another variable C) Median values D) Means of continuous data Answer: A Explanation: It evaluates how well data follow a hypothesized distribution. Question 58. The F‑distribution is used in ANOVA because: A) It models the ratio of two independent chi‑square variables divided by their degrees of freedom B) It is symmetric about zero C) It has only one parameter D) It is discrete Answer: A Explanation: F = (SS_between/df1)/(SS_within/df2) follows an F‑distribution.

Certificate Practice Exam

Question 59. Which metric is appropriate for evaluating a regression model’s predictive accuracy? A) Accuracy B) Mean Squared Error (MSE) C) ROC AUC D) Confusion matrix Answer: B Explanation: MSE measures average squared difference between predicted and actual values. Question 60. In the context of hypothesis testing, a “two‑tailed” test is used when: A) The alternative hypothesis specifies a direction B) The alternative hypothesis only states inequality (≠) C) Only one side of the distribution is of interest D) Sample size is small Answer: B Explanation: Two‑tailed tests consider deviations in both directions from H₀. Question 61. The bootstrap method is primarily used to: A) Increase sample size by duplicating data B) Estimate the sampling distribution of a statistic by resampling with replacement C) Perform parametric tests on non‑normal data D) Reduce dimensionality Answer: B Explanation: Bootstrap creates many resamples to approximate the distribution of an estimator.