

















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This practice exam evaluates a learner’s understanding of integrated DevOps pipelines combined with Big Data ecosystems. It covers CI/CD workflows, infrastructure automation, distributed computing, Hadoop, Spark, data ingestion techniques, cloud-native orchestration, monitoring, containerization, and security in automated environments. Students experience scenario-based questions aligned to real-world engineering challenges, including deployment automation, data lake architecture, streaming analytics, and performance optimization. This exam ensures readiness for professional DevOps and Big Data certification tracks by reinforcing analytical, operational, and technical decision-making skills.
Typology: Exams
1 / 89
This page cannot be seen from the preview
Don't miss anything!


















































































Question 1. Which of the following best describes the First Way in the DevOps Three Ways philosophy? A) Emphasizing rapid feedback loops B) Optimizing the flow of work from development to operations C) Encouraging continual learning and experimentation D) Implementing strict governance policies Answer: B Explanation: The First Way focuses on increasing the flow of work, ensuring that value moves smoothly from development to operations. Question 2. In the CAMS model, which pillar primarily addresses the use of metrics and monitoring? A) Culture B) Automation C) Measurement D) Sharing Answer: C Explanation: Measurement deals with collecting data, defining metrics, and using them to drive improvement. Question 3. How does DevOps differ fundamentally from traditional Agile? A) DevOps eliminates the need for testing B) DevOps includes operations and delivery aspects, not just development C) Agile focuses on infrastructure automation D) DevOps does not use iterative development cycles Answer: B
Explanation: Agile primarily addresses development processes, while DevOps extends principles to include operations, delivery, and continuous improvement. Question 4. Which organizational culture type is characterized by encouraging experimentation, tolerating failures, and learning from them? A) Bureaucratic B) Pathological C) Generative D) Hierarchical Answer: C Explanation: A generative culture promotes psychological safety, learning, and continuous improvement. Question 5. What metric measures the total time taken from a code commit to its successful deployment in production? A) Cycle time B) Lead time C) Deployment frequency D) Mean time to recovery (MTTR) Answer: B Explanation: Lead time captures the elapsed time from request (e.g., commit) to delivery (deployment). Question 6. Which Git branching strategy encourages developers to commit to a single long- lived branch and use feature flags for releases? A) GitFlow B) Trunk‑Based Development
C) Docker D) Terraform Answer: B Explanation: SonarQube analyzes source code for quality, bugs, and security issues. Question 10. What deployment pattern minimizes user impact by routing traffic to a new version while keeping the old version running? A) Rolling deployment B) Canary deployment C) Blue/Green deployment D) A/B testing Answer: C Explanation: Blue/Green creates two identical environments; traffic is switched from blue to green, allowing quick rollback. Question 11. Which of the following is a key characteristic of a successful rollback strategy? A) Manual database schema changes only B) Automatic restoration of the previous version without data loss C) Deploying new features before rollback D) Disabling monitoring during rollback Answer: B Explanation: Effective rollbacks restore the previous stable state automatically, preserving data integrity. Question 12. What does “drift detection” refer to in Infrastructure as Code (IaC)? A) Detecting changes in application code
B) Identifying differences between declared infrastructure and actual state C) Measuring latency in network traffic D) Tracking user permissions changes Answer: B Explanation: Drift detection finds mismatches between the IaC definition and the real infrastructure. Question 13. Which IaC tool uses a declarative language based on JSON/YAML to define AWS resources? A) Terraform B) Ansible C) CloudFormation D) Chef Answer: C Explanation: AWS CloudFormation uses JSON or YAML templates to declare resources. Question 14. In Ansible, which term describes a reusable set of tasks that can be applied to multiple hosts? A) Role B) Playbook C) Inventory D) Module Answer: A Explanation: Roles encapsulate variables, tasks, handlers, and files for reuse across playbooks. Question 15. Which command is used to list all Docker images stored locally?
Question 18. In Kubernetes, what does the Horizontal Pod Autoscaler (HPA) primarily use to scale pods? A) Number of nodes in the cluster B) CPU utilization or custom metrics C) Disk space consumption D) Number of services deployed Answer: B Explanation: HPA adjusts replica counts based on observed CPU usage or custom metrics. Question 19. Which serverless compute service executes code in response to events without provisioning servers? A) Amazon EC B) AWS Lambda C) Amazon RDS D) AWS Elastic Beanstalk Answer: B Explanation: AWS Lambda runs functions triggered by events, abstracting server management. Question 20. Which microservice communication pattern is best suited for decoupled, asynchronous processing? A) Synchronous REST calls B) Direct database sharing C) Message queue or event streaming D) Shared memory Answer: C
Explanation: Message queues (e.g., Kafka) enable asynchronous, loosely coupled communication. Question 21. Which component of the ELK stack is responsible for indexing and searching log data? A) Elasticsearch B) Logstash C) Kibana D) Beats Answer: A Explanation: Elasticsearch stores and indexes logs, providing fast search capabilities. Question 22. What are the “four golden signals” of monitoring? A) CPU, Memory, Disk, Network B) Latency, Traffic, Errors, Saturation C) Availability, Throughput, Consistency, Partition tolerance D) Uptime, Downtime, Response time, Load Answer: B Explanation: Latency, traffic, errors, and saturation are essential metrics for system health. Question 23. In SRE, what does the error budget represent? A) Total number of bugs in a release B) The allowable amount of downtime or errors within a service level objective (SLO) C) The cost of fixing security vulnerabilities D) The maximum number of concurrent users allowed
D) Fuzz testing Answer: C Explanation: Static Application Security Testing (SAST) scans code for vulnerabilities statically. Question 27. In IAM, what principle ensures users have only the permissions they need to perform their jobs? A) Role‑based access control (RBAC) B) Least privilege C) Mandatory access control (MAC) D) Discretionary access control (DAC) Answer: B Explanation: The principle of least privilege limits permissions to the minimum required. Question 28. Which of the following is NOT one of the 5 V’s of Big Data? A) Volume B. Velocity C. Variety D. Verification Answer: D Explanation: Verification is not a standard V; the five are Volume, Velocity, Variety, Veracity, and Value. Question 29. In a data lake, which type of data is typically stored in its raw, unprocessed form? A. Structured relational tables only B. Only CSV files
C. Both structured and unstructured data D. Pre‑aggregated reports Answer: C Explanation: Data lakes accept raw data of any format, preserving its original state. Question 30. Which NoSQL database is best suited for storing graph relationships such as social network connections? A. MongoDB B. Cassandra C. Neo4j D. Redis Answer: C Explanation: Neo4j is a graph database optimized for relationship queries. Question 31. In Hadoop, what is the primary role of the NameNode? A. Execute map tasks B. Store block metadata and namespace information C. Manage resource allocation across the cluster D. Perform data replication between DataNodes Answer: B Explanation: The NameNode maintains the file system namespace and block locations. Question 32. How does HDFS achieve fault tolerance? A. By using RAID on each DataNode disk B. By replicating each block across multiple DataNodes
C. Table inheritance D. Partitioning Answer: D Explanation: Partitioning divides a table into subdirectories based on column values, enabling predicate push‑down. Question 36. In Pig Latin, which statement loads data from HDFS into a relation? A. STORE B. LOAD C. FILTER D. GROUP Answer: B Explanation: LOAD reads data from HDFS into a Pig relation. Question 37. What is the primary purpose of Apache Sqoop? A. Real‑time streaming ingestion B. Bulk import/export between relational databases and Hadoop C. Data transformation within HDFS D. Managing Hadoop security policies Answer: B Explanation: Sqoop efficiently transfers large data sets between RDBMS and Hadoop. Question 38. Which component of Apache Flume is responsible for receiving data from external sources? A. Sink
B. Channel C. Source D. Agent Answer: C Explanation: The Source captures data from external systems and forwards it to the channel. Question 39. In Spark, what is an RDD? A. A relational database driver B. A resilient distributed dataset, the fundamental immutable data structure C. A real‑time data stream D. A resource deployment descriptor Answer: B Explanation: RDDs are fault‑tolerant, immutable collections of objects partitioned across the cluster. Question 40. Which Spark transformation is lazy, meaning it does not execute until an action is called? A. collect() B. reduce() C. map() D. count() Answer: C Explanation: map is a transformation; Spark builds a DAG and evaluates it only when an action (e.g., collect) runs.
Question 44. Which Kafka component tracks the offset of each consumer group? A. Broker B. ZooKeeper (or the internal __consumer_offsets topic in newer versions) C. Producer D. Connect worker Answer: B Explanation: Offsets are stored in ZooKeeper (or the internal topic) to allow consumers to resume where they left off. Question 45. What is a primary benefit of using Kafka Connect? A. Real‑time data analytics within Kafka B. Simplified integration between Kafka and external systems via source/sink connectors C. Automatic schema enforcement D. Built‑in machine learning capabilities Answer: B Explanation: Kafka Connect provides ready‑made connectors to move data in/out of Kafka without custom code. Question 46. Which cloud service is a fully managed Hadoop and Spark platform on AWS? A. Amazon Redshift B. AWS Glue C. Amazon EMR D. Amazon Athena Answer: C
Explanation: Amazon EMR provisions Hadoop, Spark, Hive, and related components as a managed service. Question 47. Which Google Cloud service offers a serverless, interactive SQL analytics engine for big data? A. BigQuery B. Cloud Dataflow C. Cloud Dataproc D. Cloud Pub/Sub Answer: A Explanation: BigQuery is a fully managed, serverless data warehouse for analytics. Question 48. In a data lake architecture, which storage layer provides durable, low‑cost object storage? A. Amazon RDS B. Amazon S3 (or Azure Blob Storage, GCS) C. Amazon DynamoDB D. Amazon Redshift Answer: B Explanation: Object storage services serve as the foundation for data lakes due to scalability and cost efficiency. Question 49. Which Azure service is a fully managed, scalable data warehouse? A. Azure Synapse Analytics (formerly SQL Data Warehouse) B. Azure Data Lake Storage C. Azure Cosmos DB
C. GraphX API D. Streaming API Answer: B Explanation: DataFrames provide a schema‑aware abstraction and allow SQL‑style operations. Question 53. Which of the following is a valid reason to choose a column‑family NoSQL database like Cassandra? A. Need for complex joins across tables B. High write throughput with horizontal scalability C. Strong ACID transactions across multiple rows D. Small, static datasets Answer: B Explanation: Cassandra excels at high‑volume writes and linear scalability across nodes. Question 54. What does the “veracity” dimension of Big Data refer to? A. The speed at which data is generated B. The trustworthiness and quality of data C. The monetary value derived from data D. The volume of data stored Answer: B Explanation: Veracity addresses data accuracy, consistency, and reliability. Question 55. Which Spark component is responsible for scheduling tasks on worker nodes? A. Driver B. Executor
C. Cluster Manager (e.g., YARN, Mesos, Standalone) D. Spark UI Answer: C Explanation: The Cluster Manager allocates resources and schedules executors. Question 56. In Kubernetes, which resource is used to store non‑confidential configuration data as key‑value pairs? A. Secret B. ConfigMap C. PersistentVolume D. ServiceAccount Answer: B Explanation: ConfigMaps hold configuration data that pods can consume as environment variables or files. Question 57. Which CI tool integrates natively with GitHub and uses YAML files stored in the repository to define pipelines? A. Jenkins B. GitHub Actions C. Travis CI D. CircleCI Answer: B Explanation: GitHub Actions reads workflow definitions from .github/workflows/*.yml. Question 58. What is the main advantage of using a “canary” deployment? A. Deploying to all users simultaneously