Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Kizen Big Data Analytics Practice Exam, Exams of Technology

Technology

This practice exam evaluates analytical skills with large datasets using statistical techniques, predictive modeling, data mining, visualization tools, and business intelligence concepts. Topics include regression, clustering, classification, data preprocessing, ETL workflows, dashboards, and decision-making analytics. Case-based questions simulate real-world data challenges requiring interpretation of complex datasets and actionable insight generation.

Typology: Exams

2025/2026

Available from 01/07/2026

shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 117

This page cannot be seen from the preview

Don't miss anything!

Kizen Big Data Analytics Practice Exam

Question 1. **Which of the following best describes the “Volume” characteristic of Big Data?**

A) Speed at which data is generated

B) The amount of data generated

C) Variety of data formats

D) Accuracy of data

Answer: B

Explanation: Volume refers to the massive amount of data produced, often measured in terabytes or

petabytes, distinguishing Big Data from traditional datasets.

---

Question 2. **What does “Velocity” refer to in the context of the 5 V’s of Big Data?**

A) The diversity of data sources

B) The rate at which data is created, collected, and processed

C) The reliability of data

D) The monetary value derived from data

Answer: B

Explanation: Velocity describes the speed of data flow, requiring real‑time or near‑real‑time processing

capabilities.

---

Question 3. **Which V of Big Data addresses the trustworthiness and quality of data?**

Partial preview of the text

Download Kizen Big Data Analytics Practice Exam and more Exams Technology in PDF only on Docsity!

Question 1. Which of the following best describes the “Volume” characteristic of Big Data? A) Speed at which data is generated B) The amount of data generated C) Variety of data formats D) Accuracy of data Answer: B Explanation: Volume refers to the massive amount of data produced, often measured in terabytes or petabytes, distinguishing Big Data from traditional datasets.

Question 2. What does “Velocity” refer to in the context of the 5 V’s of Big Data? A) The diversity of data sources B) The rate at which data is created, collected, and processed C) The reliability of data D) The monetary value derived from data Answer: B Explanation: Velocity describes the speed of data flow, requiring real‑time or near‑real‑time processing capabilities.

Question 3. Which V of Big Data addresses the trustworthiness and quality of data?

A) Volume B) Variety C) Veracity D) Value Answer: C Explanation: Veracity deals with data accuracy, consistency, and reliability, crucial for sound analytics.

Question 4. In Big Data, “Value” is primarily concerned with: A) The size of data sets B) The speed of data ingestion C) The usefulness of data for decision making D) The number of data sources Answer: C Explanation: Value focuses on extracting actionable insights that provide business benefits from raw data.

Question 5. Which of the following is a structured data format? A) JPEG image B) JSON document

Answer: C Explanation: Relational database rows are structured; the others lack a fixed schema.

Question 8. A primary business driver for adopting Big Data in healthcare is: A) Reducing website load times B) Enhancing patient outcome predictions through predictive analytics C) Increasing physical store footfall D) Automating payroll processing Answer: B Explanation: Healthcare leverages large patient datasets to predict disease risk and improve outcomes.

Question 9. In the Data & Analytics Maturity Framework, an organization that only reacts to incidents is at which level? A) Reactive B) Proactive C) Strategic D) Optimized Answer: A

Explanation: Reactive maturity means analytics are used only after problems occur, lacking foresight.

Question 10. Which maturity level focuses on using analytics to shape long‑term business strategy? A) Reactive B) Proactive C) Strategic D) Tactical Answer: C Explanation: Strategic maturity integrates analytics into planning and competitive positioning.

Question 11. KAIZEN™ “Muda” refers to: A) Continuous improvement cycles B) Waste or non‑value‑adding activities C) Standard operating procedures D) Customer feedback loops Answer: B Explanation: Muda is a Japanese term for waste, which KAIZEN seeks to eliminate.

B) Bottlenecks and waste in data flow C) New data sources to add D) Ways to encrypt data Answer: B Explanation: Value‑stream mapping visualizes the flow from source to insight, revealing inefficiencies.

Question 15. In modern data architecture, the Lambda model combines: A) Batch processing and real‑time streaming B) Relational and NoSQL databases only C) Data lake and data warehouse in a single layer D) On‑premise and cloud storage simultaneously Answer: A Explanation: Lambda architecture uses a batch layer for historical data and a speed layer for real‑time data.

Question 16. Which architecture eliminates the separate batch layer by processing all data as streams? A) Lambda B) Kappa

C) Three‑tier D) Microservices Answer: B Explanation: Kappa architecture treats all data as a stream, simplifying design by removing the batch layer.

Question 17. A Data Lake is primarily designed for: A) Storing raw, unprocessed data of any type B) Hosting pre‑aggregated reports only C) Enforcing strict schema on write D) Replacing all relational databases Answer: A Explanation: Data Lakes accept raw data in its native format, enabling flexible future processing.

Question 18. Which statement best describes a Data Warehouse? A) Stores raw logs without transformation B) Holds curated, structured data optimized for analytical queries C) Is a file system for image storage D) Provides real‑time streaming capabilities

Answer: B Explanation: YARN (Yet Another Resource Negotiator) allocates CPU, memory, and schedules tasks.

Question 21. In MapReduce, the “Map” phase is responsible for: A) Aggregating final results B) Sorting data across nodes C) Transforming input key/value pairs into intermediate key/value pairs D) Managing cluster resources Answer: C Explanation: The Map function processes raw input and emits intermediate key/value pairs for reduction.

Question 22. Which of the following is NOT a component of Apache Spark? A) Spark SQL B) Spark Streaming C) Spark MLlib D) Spark Hadoop Answer: D

Explanation: Spark Hadoop is not a Spark component; Spark integrates with Hadoop but does not have a module named “Spark Hadoop”.

Question 23. Resilient Distributed Datasets (RDDs) in Spark are: A) Immutable, partitioned collections of objects that can be processed in parallel B) Mutable tables stored in HDFS C) Real‑time streaming windows only D) Indexes for relational databases Answer: A Explanation: RDDs are the core abstraction in Spark, providing fault tolerance and parallelism.

Question 24. Which Spark library is dedicated to graph processing? A) GraphX B) MLlib C) SparkR D) SparkSQL Answer: A Explanation: GraphX enables graph-parallel computation on top of Spark’s core engine.

Question 27. Which NoSQL database is best suited for wide‑column storage and high write throughput? A) MongoDB B) Cassandra C) Neo4j D) Elasticsearch Answer: B Explanation: Cassandra uses a column‑family model optimized for massive write scalability.

Question 28. Document-oriented databases such as MongoDB store data in: A) Tables with strict schemas B) JSON‑like documents that can vary in structure C) Key/value pairs only D) Graph relationships Answer: B Explanation: MongoDB’s BSON documents allow flexible, semi‑structured data storage.

Question 29. ETL stands for:

A) Extract, Transform, Load B) Encode, Transfer, Log C) Evaluate, Test, Launch D) Enrich, Tag, Link Answer: A Explanation: ETL is the classic pipeline for moving data from source to destination after transformation.

Question 30. ELT differs from ETL mainly in that: A) Transformation occurs after loading data into the target system B) Data is never transformed C) It only works with structured data D) It requires no data warehouse Answer: A Explanation: ELT loads raw data first and then transforms it within the target system, leveraging its processing power.

Question 31. During the Data Analytics Lifecycle, which phase follows “Data Identification”? A) Data Extraction B) Data Filtering

Answer: B Explanation: Winsorizing replaces extreme values with nearest acceptable thresholds, preserving data size.

Question 34. Feature scaling using Z‑score standardization transforms data to have a mean of: A) 0 and standard deviation of 1 B) 1 and standard deviation of 0 C) 0 and variance of 1 D) 1 and variance of 1 Answer: A Explanation: Z‑score standardization subtracts the mean and divides by the standard deviation.

Question 35. Descriptive statistics that measure the spread of data include: A) Mean and median B) Mode and frequency C) Variance and standard deviation D) Skewness and kurtosis Answer: C Explanation: Variance and standard deviation quantify dispersion around the mean.

Question 36. Kurtosis describes: A) The symmetry of a distribution B) The peakedness or tail heaviness of a distribution C) The average distance between data points D) The number of outliers Answer: B Explanation: Kurtosis indicates whether data have heavy tails (leptokurtic) or light tails (platykurtic).

Question 37. A bivariate analysis examines: A) One variable at a time B) The relationship between two variables C) Three or more variables simultaneously D) Only categorical variables Answer: B Explanation: Bivariate analysis focuses on the interaction or correlation between a pair of variables.

A) K‑Means clustering B) Linear Regression C) Logistic Regression D) Decision Tree Answer: C Explanation: Logistic Regression outputs probabilities for two classes and models a linear boundary in feature space.

Question 41. Random Forest improves upon a single Decision Tree by: A) Using only one feature per split B) Averaging predictions from multiple decorrelated trees to reduce overfitting C) Performing gradient descent on tree parameters D) Converting trees into linear models Answer: B Explanation: Random Forest builds many trees on bootstrapped samples and aggregates results, enhancing robustness.

Question 42. Bagging (Bootstrap Aggregating) reduces variance by: A) Adding regularization to a single model

B) Training multiple models on different random subsets and averaging their predictions C) Pruning decision trees aggressively D) Using a single, deeper tree Answer: B Explanation: Bagging creates diverse models via bootstrapped data, then combines them to stabilize predictions.

Question 43. Which unsupervised algorithm groups data points based on distance to centroids? A) Hierarchical clustering B) K‑Means clustering C) Principal Component Analysis D) Naïve Bayes Answer: B Explanation: K‑Means iteratively assigns points to the nearest centroid and updates centroids.

Question 44. Principal Component Analysis (PCA) is used to: A) Increase the number of features B) Reduce dimensionality while preserving variance C) Cluster data into groups

Kizen Big Data Analytics Practice Exam, Exams of Technology

Related documents

Partial preview of the text

Download Kizen Big Data Analytics Practice Exam and more Exams Technology in PDF only on Docsity!

Explanation: Reactive maturity means analytics are used only after problems occur, lacking foresight.