Kizen Big Data Analytics Practice Exam, Exams of Technology

This practice exam evaluates analytical skills with large datasets using statistical techniques, predictive modeling, data mining, visualization tools, and business intelligence concepts. Topics include regression, clustering, classification, data preprocessing, ETL workflows, dashboards, and decision-making analytics. Case-based questions simulate real-world data challenges requiring interpretation of complex datasets and actionable insight generation.

Typology: Exams

2025/2026

Available from 01/07/2026

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 117

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Kizen Big Data Analytics Practice Exam
Question 1. **Which of the following best describes the “Volume” characteristic of Big Data?**
A) Speed at which data is generated
B) The amount of data generated
C) Variety of data formats
D) Accuracy of data
Answer: B
Explanation: Volume refers to the massive amount of data produced, often measured in terabytes or
petabytes, distinguishing Big Data from traditional datasets.
---
Question 2. **What does “Velocity” refer to in the context of the 5 V’s of Big Data?**
A) The diversity of data sources
B) The rate at which data is created, collected, and processed
C) The reliability of data
D) The monetary value derived from data
Answer: B
Explanation: Velocity describes the speed of data flow, requiring realtime or nearrealtime processing
capabilities.
---
Question 3. **Which V of Big Data addresses the trustworthiness and quality of data?**
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Kizen Big Data Analytics Practice Exam and more Exams Technology in PDF only on Docsity!

Question 1. Which of the following best describes the “Volume” characteristic of Big Data? A) Speed at which data is generated B) The amount of data generated C) Variety of data formats D) Accuracy of data Answer: B Explanation: Volume refers to the massive amount of data produced, often measured in terabytes or petabytes, distinguishing Big Data from traditional datasets.


Question 2. What does “Velocity” refer to in the context of the 5 V’s of Big Data? A) The diversity of data sources B) The rate at which data is created, collected, and processed C) The reliability of data D) The monetary value derived from data Answer: B Explanation: Velocity describes the speed of data flow, requiring real‑time or near‑real‑time processing capabilities.


Question 3. Which V of Big Data addresses the trustworthiness and quality of data?

A) Volume B) Variety C) Veracity D) Value Answer: C Explanation: Veracity deals with data accuracy, consistency, and reliability, crucial for sound analytics.


Question 4. In Big Data, “Value” is primarily concerned with: A) The size of data sets B) The speed of data ingestion C) The usefulness of data for decision making D) The number of data sources Answer: C Explanation: Value focuses on extracting actionable insights that provide business benefits from raw data.


Question 5. Which of the following is a structured data format? A) JPEG image B) JSON document

Answer: C Explanation: Relational database rows are structured; the others lack a fixed schema.


Question 8. A primary business driver for adopting Big Data in healthcare is: A) Reducing website load times B) Enhancing patient outcome predictions through predictive analytics C) Increasing physical store footfall D) Automating payroll processing Answer: B Explanation: Healthcare leverages large patient datasets to predict disease risk and improve outcomes.


Question 9. In the Data & Analytics Maturity Framework, an organization that only reacts to incidents is at which level? A) Reactive B) Proactive C) Strategic D) Optimized Answer: A

Explanation: Reactive maturity means analytics are used only after problems occur, lacking foresight.

Question 10. Which maturity level focuses on using analytics to shape long‑term business strategy? A) Reactive B) Proactive C) Strategic D) Tactical Answer: C Explanation: Strategic maturity integrates analytics into planning and competitive positioning.


Question 11. KAIZEN™ “Muda” refers to: A) Continuous improvement cycles B) Waste or non‑value‑adding activities C) Standard operating procedures D) Customer feedback loops Answer: B Explanation: Muda is a Japanese term for waste, which KAIZEN seeks to eliminate.


B) Bottlenecks and waste in data flow C) New data sources to add D) Ways to encrypt data Answer: B Explanation: Value‑stream mapping visualizes the flow from source to insight, revealing inefficiencies.


Question 15. In modern data architecture, the Lambda model combines: A) Batch processing and real‑time streaming B) Relational and NoSQL databases only C) Data lake and data warehouse in a single layer D) On‑premise and cloud storage simultaneously Answer: A Explanation: Lambda architecture uses a batch layer for historical data and a speed layer for real‑time data.


Question 16. Which architecture eliminates the separate batch layer by processing all data as streams? A) Lambda B) Kappa

C) Three‑tier D) Microservices Answer: B Explanation: Kappa architecture treats all data as a stream, simplifying design by removing the batch layer.


Question 17. A Data Lake is primarily designed for: A) Storing raw, unprocessed data of any type B) Hosting pre‑aggregated reports only C) Enforcing strict schema on write D) Replacing all relational databases Answer: A Explanation: Data Lakes accept raw data in its native format, enabling flexible future processing.


Question 18. Which statement best describes a Data Warehouse? A) Stores raw logs without transformation B) Holds curated, structured data optimized for analytical queries C) Is a file system for image storage D) Provides real‑time streaming capabilities

Answer: B Explanation: YARN (Yet Another Resource Negotiator) allocates CPU, memory, and schedules tasks.


Question 21. In MapReduce, the “Map” phase is responsible for: A) Aggregating final results B) Sorting data across nodes C) Transforming input key/value pairs into intermediate key/value pairs D) Managing cluster resources Answer: C Explanation: The Map function processes raw input and emits intermediate key/value pairs for reduction.


Question 22. Which of the following is NOT a component of Apache Spark? A) Spark SQL B) Spark Streaming C) Spark MLlib D) Spark Hadoop Answer: D

Explanation: Spark Hadoop is not a Spark component; Spark integrates with Hadoop but does not have a module named “Spark Hadoop”.


Question 23. Resilient Distributed Datasets (RDDs) in Spark are: A) Immutable, partitioned collections of objects that can be processed in parallel B) Mutable tables stored in HDFS C) Real‑time streaming windows only D) Indexes for relational databases Answer: A Explanation: RDDs are the core abstraction in Spark, providing fault tolerance and parallelism.


Question 24. Which Spark library is dedicated to graph processing? A) GraphX B) MLlib C) SparkR D) SparkSQL Answer: A Explanation: GraphX enables graph-parallel computation on top of Spark’s core engine.

Question 27. Which NoSQL database is best suited for wide‑column storage and high write throughput? A) MongoDB B) Cassandra C) Neo4j D) Elasticsearch Answer: B Explanation: Cassandra uses a column‑family model optimized for massive write scalability.


Question 28. Document-oriented databases such as MongoDB store data in: A) Tables with strict schemas B) JSON‑like documents that can vary in structure C) Key/value pairs only D) Graph relationships Answer: B Explanation: MongoDB’s BSON documents allow flexible, semi‑structured data storage.


Question 29. ETL stands for:

A) Extract, Transform, Load B) Encode, Transfer, Log C) Evaluate, Test, Launch D) Enrich, Tag, Link Answer: A Explanation: ETL is the classic pipeline for moving data from source to destination after transformation.


Question 30. ELT differs from ETL mainly in that: A) Transformation occurs after loading data into the target system B) Data is never transformed C) It only works with structured data D) It requires no data warehouse Answer: A Explanation: ELT loads raw data first and then transforms it within the target system, leveraging its processing power.


Question 31. During the Data Analytics Lifecycle, which phase follows “Data Identification”? A) Data Extraction B) Data Filtering

Answer: B Explanation: Winsorizing replaces extreme values with nearest acceptable thresholds, preserving data size.


Question 34. Feature scaling using Z‑score standardization transforms data to have a mean of: A) 0 and standard deviation of 1 B) 1 and standard deviation of 0 C) 0 and variance of 1 D) 1 and variance of 1 Answer: A Explanation: Z‑score standardization subtracts the mean and divides by the standard deviation.


Question 35. Descriptive statistics that measure the spread of data include: A) Mean and median B) Mode and frequency C) Variance and standard deviation D) Skewness and kurtosis Answer: C Explanation: Variance and standard deviation quantify dispersion around the mean.

Question 36. Kurtosis describes: A) The symmetry of a distribution B) The peakedness or tail heaviness of a distribution C) The average distance between data points D) The number of outliers Answer: B Explanation: Kurtosis indicates whether data have heavy tails (leptokurtic) or light tails (platykurtic).


Question 37. A bivariate analysis examines: A) One variable at a time B) The relationship between two variables C) Three or more variables simultaneously D) Only categorical variables Answer: B Explanation: Bivariate analysis focuses on the interaction or correlation between a pair of variables.


A) K‑Means clustering B) Linear Regression C) Logistic Regression D) Decision Tree Answer: C Explanation: Logistic Regression outputs probabilities for two classes and models a linear boundary in feature space.


Question 41. Random Forest improves upon a single Decision Tree by: A) Using only one feature per split B) Averaging predictions from multiple decorrelated trees to reduce overfitting C) Performing gradient descent on tree parameters D) Converting trees into linear models Answer: B Explanation: Random Forest builds many trees on bootstrapped samples and aggregates results, enhancing robustness.


Question 42. Bagging (Bootstrap Aggregating) reduces variance by: A) Adding regularization to a single model

B) Training multiple models on different random subsets and averaging their predictions C) Pruning decision trees aggressively D) Using a single, deeper tree Answer: B Explanation: Bagging creates diverse models via bootstrapped data, then combines them to stabilize predictions.


Question 43. Which unsupervised algorithm groups data points based on distance to centroids? A) Hierarchical clustering B) K‑Means clustering C) Principal Component Analysis D) Naïve Bayes Answer: B Explanation: K‑Means iteratively assigns points to the nearest centroid and updates centroids.


Question 44. Principal Component Analysis (PCA) is used to: A) Increase the number of features B) Reduce dimensionality while preserving variance C) Cluster data into groups