

























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This exam tests data acquisition, statistical analysis, data visualization, predictive modeling, and data-driven decision-making. Topics include SQL, Excel analytics, Python/R fundamentals, machine learning basics, dashboard creation, and KPI development. Candidates interpret datasets, build analytical models, design visual reports, and explain insights aligned with business objectives. Real-world case scenarios require optimizing decisions using data mining, forecasting, and exploratory analysis.
Typology: Exams
1 / 97
This page cannot be seen from the preview
Don't miss anything!


























































































Question 1. In the CRISP‑DM framework, which phase focuses on translating the business problem into a data‑mining goal? A) Data Understanding B) Business Understanding C) Data Preparation D) Deployment Answer: B Explanation: Business Understanding is the first CRISP‑DM step where project objectives and data‑mining goals are defined. Question 2. Which of the following activities belongs to the Data Understanding phase? A) Building predictive models B) Collecting initial data sets C) Deploying the model in production D) Cleaning and transforming data Answer: B Explanation: Data Understanding involves gathering initial data and exploring its characteristics. Question 3. During Data Preparation, which process is primarily responsible for handling missing values? A) Feature selection B) Data cleaning C) Model evaluation D) Business case definition
Answer: B Explanation: Data cleaning addresses missing, inconsistent, or erroneous data before modeling. Question 4. In CRISP‑DM, the Data Modeling phase typically includes which activity? A) Defining key performance indicators (KPIs) B) Selecting appropriate modeling techniques C) Conducting stakeholder interviews D) Archiving raw data Answer: B Explanation: Data Modeling is where analysts choose algorithms and build models. Question 5. Which CRISP‑DM phase assesses whether the model meets the business objectives? A) Data Evaluation B) Data Preparation C) Data Understanding D) Deployment Answer: A Explanation: Data Evaluation compares model results against business goals. Question 6. The final CRISP‑DM phase, Deployment, most commonly includes which task? A) Splitting data into training and test sets
Question 9. Which data type is best described as semi‑structured? A) Relational tables in SQL B) Plain text documents C) JSON files D) Binary image files Answer: C Explanation: JSON contains tags and hierarchy but does not conform to a rigid schema, making it semi‑structured. Question 10. Structured data is typically stored in: A) Hadoop Distributed File System (HDFS) B) Relational databases C) Email archives D) Video streaming platforms Answer: B Explanation: Relational databases enforce schema, making them ideal for structured data. Question 11. Which source would be considered an external data source for a retail company? A) Point‑of‑sale transaction logs B) Employee payroll system C) Social media sentiment feeds D) Internal inventory database
Answer: C Explanation: Social media feeds originate outside the organization. Question 12. An ERP system primarily provides which type of data? A) External market trends B) Internal operational data C) Public demographic data D) Weather forecasts Answer: B Explanation: ERP (Enterprise Resource Planning) captures internal business processes. Question 13. Which method is most appropriate for collecting real‑time stock price data? A) Manual entry B) API integration C) Printed newspaper scanning D) Email attachment Answer: B Explanation: APIs allow automated, real‑time data retrieval. Question 14. Web scraping is best suited for obtaining: A) Structured data from relational databases B) Unstructured text from web pages C) Sensor data from IoT devices
B) Heat‑map visualization C) Data cleansing D) Business requirement gathering Answer: A Explanation: Predictive analytics employs models like ARIMA for forecasting. Question 18. Prescriptive analytics differs from predictive analytics by: A) Using only descriptive statistics B) Suggesting optimal actions based on predictions C) Ignoring business constraints D) Focusing solely on data collection Answer: B Explanation: Prescriptive analytics provides recommendations on what to do next. Question 19. Which language is most widely used for statistical modeling in finance? A) HTML B) Python C) SQL D) R Answer: D Explanation: R has extensive packages for statistical analysis and is popular among finance analysts.
Question 20. Which tool is best suited for ad‑hoc data manipulation by non‑technical users? A) Hadoop B) Excel C) TensorFlow D) SAS Answer: B Explanation: Excel offers a familiar interface for quick data tasks. Question 21. In SQL, which clause is used to filter rows after aggregation? A) WHERE B) GROUP BY C) HAVING D) ORDER BY Answer: C Explanation: HAVING filters grouped results, whereas WHERE applies before aggregation. Question 22. Which Python library is primarily used for data manipulation and analysis? A) Matplotlib B) NumPy C) Pandas D) Scikit‑learn
C) Time‑series trends D) Hierarchical data Answer: B Explanation: Scatter plots show correlation between two numeric dimensions. Question 26. Which chart type can be misleading if the y‑axis does not start at zero? A) Pie chart B) Bar chart C) Line chart D) Scatter plot Answer: B Explanation: Bar charts rely on baseline zero; truncating the axis can exaggerate differences. Question 27. In data visualization, “chart junk” refers to: A) Missing data points B) Unnecessary decorative elements that obscure insight C) Inconsistent color palettes D) Overly large datasets Answer: B Explanation: Chart junk adds visual clutter without adding informational value.
Question 28. Which visualization would best depict market share percentages among five companies? A) Stacked bar chart B) Pie chart C) Heat map D) Box plot Answer: B Explanation: Pie charts effectively show parts of a whole for a limited number of categories. Question 29. What is the primary purpose of a box plot? A) Show distribution quartiles and outliers B) Display cumulative totals over time C) Compare categorical frequencies D) Illustrate geographic data Answer: A Explanation: Box plots summarize median, quartiles, and potential outliers. Question 30. Which of the following best describes data veracity? A) The speed of data generation B) The trustworthiness and quality of data C) The size of the dataset D) The variety of data formats
C) Network latency D) File format compatibility Answer: A Explanation: AI decisions must be explainable to avoid unfair discrimination. Question 34. Which security measure is essential when transmitting financial data over the internet? A) CSV formatting B) SSL/TLS encryption C) Data compression D) Color‑coded charts Answer: B Explanation: SSL/TLS secures data in transit against interception. Question 35. Which of the following best defines “data value”? A) The monetary cost of storing data B) The insight and ROI derived from analyzing data C) The number of rows in a dataset D) The bandwidth required to transfer data Answer: B Explanation: Data value measures the business benefit obtained from data insights.
Question 36. An example of unstructured data is: A) A relational table of sales transactions B) A JSON file containing product attributes C) An email body with free‑text comments D) A CSV file of inventory counts Answer: C Explanation: Free‑text emails lack predefined schema, classifying them as unstructured. Question 37. Which of the following is a key advantage of using APIs for data collection? A) Manual verification of each record B) Real‑time or near‑real‑time data retrieval C) Unlimited storage capacity D) Automatic data visualization Answer: B Explanation: APIs enable programmatic, timely access to external data sources. Question 38. In the context of data preparation, “ETL” stands for: A) Extract, Transform, Load B) Evaluate, Test, Learn C) Encode, Transfer, Link D) Estimate, Track, Log Answer: A
C) A visualization dashboard for executives D) An encrypted file system for backups Answer: B Explanation: Data lakes hold large volumes of raw, often unprocessed data. Question 42. When performing feature engineering, creating a “month‑of‑year” variable from a timestamp is an example of: A) Dimensionality reduction B) Data encoding C) Data aggregation D) Variable transformation Answer: D Explanation: Extracting month from a date transforms the raw timestamp into a useful feature. Question 43. Which of the following is a primary purpose of cross‑validation in model building? A) To increase the size of the training set B) To assess model performance on unseen data C) To visualize model coefficients D) To speed up model training Answer: B Explanation: Cross‑validation tests generalization by rotating training and validation subsets.
Question 44. In a regression model, a high Variance Inflation Factor (VIF) indicates: A) Strong predictive power B) Multicollinearity among predictors C) Overfitting due to too many observations D) Underfitting due to insufficient variables Answer: B Explanation: VIF measures how much a predictor is linearly related to other predictors. Question 45. Which of the following is an example of a KPI for a marketing analytics project? A) Number of rows in the dataset B) Click‑through rate (CTR) C) Size of the database server D) Frequency of data backups Answer: B Explanation: CTR directly reflects marketing performance and is a common KPI. Question 46. Which visualization technique is most effective for showing a time‑series trend with seasonal patterns? A) Scatter plot B) Histogram C) Line chart with multiple series D) Radar chart
B) It may not meet the specific needs of different audiences C) It reduces data security risks D) It improves data latency Answer: B Explanation: Different roles require tailored views; a one‑size‑fits‑all dashboard can be ineffective. Question 50. In the context of RPA, “bot” refers to: A) A statistical model B) A software robot that automates repetitive tasks C) A data visualization component D) A hardware device for data capture Answer: B Explanation: RPA bots mimic human actions to perform rule‑based processes. Question 51. Which of the following best illustrates “data bias” in a predictive model? A) The model runs faster on a GPU B) Training data over‑represents a particular demographic, leading to skewed predictions C) The model uses a linear algorithm D) The dataset contains missing values Answer: B Explanation: Over‑representation creates bias, affecting fairness and accuracy.
Question 52. Which SQL function is used to calculate the average of a numeric column? A) SUM() B) AVG() C) COUNT() D) MAX() Answer: B Explanation: AVG() returns the mean value of the specified column. Question 53. Which of the following is a key benefit of using cloud‑based data warehouses? A) Fixed hardware capacity B) Unlimited on‑premises storage C) Scalability and pay‑as‑you‑go pricing D) Inability to integrate with APIs Answer: C Explanation: Cloud warehouses can scale resources dynamically and charge based on usage. Question 54. In data visualization, the term “small multiples” refers to: A) Using tiny fonts to fit more information B) Displaying several similar charts side‑by‑side for comparison C) Aggregating data into a single bar D) Combining multiple data sources into one plot