




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This exam measures foundational to intermediate data analytics skills including data wrangling, visualization, dashboards, statistical modeling, and interpretation of insights. Candidates practice analysis using real-life business datasets, evaluating trends, forecasting, building KPIs, and recommending data-guided solutions. The exam includes analytics workflow design, ETL basics, and ethical data use scenarios.
Typology: Exams
1 / 111
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which of the following best describes qualitative data? A) Data that can be measured on a numeric scale B) Data that represent categories or attributes C) Data that follow a normal distribution D) Data that are always stored in relational tables Answer: B Explanation: Qualitative data consist of non‑numeric categories such as gender, color, or product type, unlike quantitative data which are numeric. Question 2. In a relational database, what is the purpose of a primary key? A) To enforce referential integrity between tables B) To uniquely identify each row in a table C) To store large binary objects D) To speed up data encryption processes Answer: B Explanation: A primary key uniquely identifies each record, ensuring no duplicate rows and enabling efficient indexing. Question 3. Which data format is best suited for hierarchical data exchange over the web? A) CSV B) JSON
C) Parquet D) SQL dump Answer: B Explanation: JSON (JavaScript Object Notation) naturally represents nested structures, making it ideal for hierarchical data interchange. Question 4. Which NoSQL database type stores data as key‑value pairs? A) Document store B) Graph database C) Column‑family store D) Key‑value store Answer: D Explanation: Key‑value stores (e.g., Redis, DynamoDB) map a unique key to an opaque value, enabling fast lookups. Question 5. In a star schema, the central table is called a: A) Fact table B) Dimension table C) Bridge table D) Lookup table Answer: A
Question 8. Data lineage is important because it helps: A) Increase storage capacity B) Track the origin and transformations of data C) Reduce network latency D) Generate random numbers for simulations Answer: B Explanation: Data lineage documents where data came from and how it was transformed, aiding transparency and debugging. Question 9. Which missing‑data mechanism assumes the probability of missingness is unrelated to any observed or unobserved values? A) MCAR B) MAR C) MNAR D) NMAR Answer: A Explanation: MCAR (Missing Completely at Random) means missingness is independent of all data, making simple imputation unbiased. Question 10. When using mean imputation for a numeric variable, which issue may arise? A) Increased variance
B) Introduction of outliers C) Underestimation of variability D) Violation of primary key constraints Answer: C Explanation: Replacing missing values with the mean reduces the variable’s variance, potentially biasing downstream analyses. Question 11. Min‑Max scaling transforms a feature to which range? A) 0 to 1 B) – 1 to 1 C) 0 to 100 D) – ∞ to +∞ Answer: A Explanation: Min‑Max scaling subtracts the minimum and divides by the range, yielding values between 0 and 1. Question 12. One‑hot encoding is appropriate for: A) Ordinal variables with natural ordering B) Nominal categorical variables without order C) Continuous numeric variables D) Binary variables already coded as 0/ Answer: B
A) Only rows with matching keys in both tables B) All rows from the left table and matching rows from the right table C) All rows from the right table and matching rows from the left table D) Rows that do not match in either table Answer: B Explanation: LEFT JOIN preserves all records from the left table and adds data from the right table when keys match. Question 16. Which Python library provides the groupby operation for data frames? A) NumPy B) Matplotlib C) Pandas D) Scikit‑learn Answer: C Explanation: Pandas’ groupby method groups rows based on column values and enables aggregation. Question 17. In hypothesis testing, a Type II error occurs when: A) The null hypothesis is true but rejected B) The null hypothesis is false but not rejected C) The alternative hypothesis is true and accepted
D) The p‑value is exactly 0. Answer: B Explanation: A Type II error (false negative) fails to reject a false null hypothesis. Question 18. Which test is appropriate for comparing the means of three independent groups? A) Paired t‑test B) One‑way ANOVA C) Chi‑square test of independence D) Mann‑Whitney U test Answer: B Explanation: One‑way ANOVA assesses whether at least one group mean differs among three or more independent groups. Question 19. The p‑value represents: A) The probability that the null hypothesis is true B) The probability of observing data as extreme as the sample, assuming the null is true C) The confidence level of the test D) The effect size of the treatment Answer: B
Question 22. Which metric is most appropriate for evaluating a highly imbalanced binary classifier? A) Accuracy B) Precision C) Recall D) F1‑score Answer: D Explanation: F1‑score balances precision and recall, providing a single measure that is robust to class imbalance. Question 23. In K‑means clustering, the algorithm minimizes: A) The sum of absolute deviations from the median B) The total within‑cluster sum of squares (inertia) C) The distance between cluster centroids and the origin D) The number of clusters Answer: B Explanation: K‑means iteratively assigns points to clusters to minimize within‑cluster variance (sum of squared distances). Question 24. Which of the following is a characteristic of a stationary time series? A) Increasing mean over time
B) Constant variance and mean over time C) Seasonal patterns that change amplitude D) Trend component that grows linearly Answer: B Explanation: Stationarity implies that statistical properties (mean, variance, autocorrelation) are constant over time. Question 25. A box plot visualizes all the following EXCEPT: A) Median B) Interquartile range C) Mean D) Outliers Answer: C Explanation: Traditional box plots display median, quartiles, and outliers, but not the mean (unless explicitly added). Question 26. Which chart type is best for showing the proportion of categories that sum to 100 %? A) Bar chart B line chart C) Pie chart D) Scatter plot
Explanation: Quick filters can be displayed as sliders, enabling interactive range selection on dashboards. Question 29. When creating a heat map, what does color intensity typically represent? A) Geographic location B) Frequency or magnitude of a variable C) Time sequence D) Data type (numeric vs. categorical) Answer: B Explanation: In heat maps, darker or more saturated colors indicate higher values or counts. Question 30. Which of the following best describes a data story’s “challenge” component? A) The final recommendation to stakeholders B) The background context and business problem being addressed C) The technical steps taken to clean the data D) The visual design of the dashboard Answer: B Explanation: The “challenge” defines the problem or question that motivates the analysis, setting the stage for the story.
Question 31. In Git, which command records changes to the repository? A) git clone B) git pull C) git commit D) git merge Answer: C Explanation: git commit creates a new snapshot of staged changes, preserving version history. Question 32. Which DDL statement is used to remove an existing table? A) DROP TABLE B) DELETE FROM C) TRUNCATE TABLE D) ALTER TABLE Answer: A Explanation: DROP TABLE permanently deletes the table definition and its data. Question 33. In Excel, which function retrieves a value from a table based on a matching key in the first column? A) SUMIF B) VLOOKUP C) CONCATENATE
Explanation: Visualization is an analytical step, not part of the ETL (Extract‑Transform‑Load) pipeline. Question 36. Which statistical test would you use to examine the relationship between two categorical variables? A) Pearson correlation B) Independent t‑test C) Chi‑square test of independence D) Paired t‑test Answer: C Explanation: The chi‑square test assesses whether the distribution of one categorical variable differs across levels of another. Question 37. In a regression model, multicollinearity refers to: A) High correlation between the dependent variable and residuals B) Strong correlation among independent variables C) Non‑linear relationship between predictors and outcome D) Heteroscedasticity of residuals Answer: B Explanation: Multicollinearity occurs when predictors are highly correlated, inflating coefficient variance and destabilizing estimates.
Question 38. Which Python library provides the train_test_split function for creating validation sets? A) NumPy B) pandas C) scikit‑learn D) matplotlib Answer: C Explanation: train_test_split is part of scikit‑learn’s model_selection module, facilitating data partitioning. Question 39. Which window function assigns a sequential integer to rows ordered by a specified column? A) LAG() B) ROW_NUMBER() C) SUM() OVER() D) FIRST_VALUE() Answer: B Explanation: ROW_NUMBER() generates a unique, consecutive integer based on the defined ordering. Question 40. In a data pipeline, which component ensures that transformed data meets quality standards before loading? A) Extractor
Answer: C Explanation: AVG() returns the arithmetic mean of the specified column’s values. Question 43. Which of the following is a common method for handling high‑cardinality categorical variables? A) One‑hot encoding all categories B) Dropping the variable entirely C) Target encoding (mean encoding) D) Converting to binary using ASCII values Answer: C Explanation: Target encoding replaces categories with the mean of the target variable, reducing dimensionality while preserving predictive power. Question 44. In time‑series forecasting, the “seasonal decomposition” technique separates a series into which components? A) Trend, noise, and outliers B) Trend, seasonality, and residual (error) C) Level, slope, and curvature D) Autocorrelation, partial autocorrelation, and lag Answer: B Explanation: Seasonal decomposition (e.g., STL) extracts trend, seasonal pattern, and residual error components.
Question 45. Which visualization best shows the relationship between two continuous variables? A) Bar chart B) Scatter plot C) Stacked area chart D) Histogram Answer: B Explanation: Scatter plots display paired observations, making it easy to assess correlation or patterns between two numeric variables. Question 46. In Power BI, which feature allows you to create a reusable calculation across multiple reports? A) Data source B) Measure (DAX) C) Query editor step D) Custom visual Answer: B Explanation: Measures, defined using DAX, are reusable calculations that can be referenced in any visual within the report. Question 47. Which of the following is NOT a typical characteristic of a “big data” environment?