




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A complete practice exam aligned with the Cloudera Data Platform (CDP) certification paths. It evaluates multi-domain knowledge: data engineering, administration, operational workflows, cloud-native analytics, security configurations, and SDX governance across public and private cloud deployments.
Typology: Exams
1 / 103
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which CDP deployment model provides a fully managed service on public cloud providers such as AWS, Azure, and GCP? A) CDP Private Cloud Base B) CDP Public Cloud C) CDP On‑Premises Data Hub D) CDP Edge Answer: B Explanation: CDP Public Cloud delivers a fully managed, SaaS‑style data platform on public cloud infrastructures, handling provisioning, scaling, and maintenance for the user. Question 2. In CDP architecture, which component is responsible for unified security, governance, and metadata across all environments? A) Cloudera Manager B) Workload XM C) Shared Data Experience (SDX) D) Replication Manager Answer: C Explanation: SDX provides consistent security policies, data cataloging, and lineage across data lake, data warehouse, and other services, ensuring a unified governance layer. Question 3. What is the primary role of the NameNode in HDFS? A) Store actual block data B) Manage metadata and namespace operations
C) Schedule MapReduce jobs D) Perform data compression Answer: B Explanation: The NameNode maintains the file system namespace, storing metadata such as file‑to‑block mappings, permissions, and directory structures. Question 4. Which YARN scheduler allows multiple queues with hierarchical capacity guarantees? A) Fair Scheduler B) Capacity Scheduler C) FIFO Scheduler D) Dominant Resource Fairness Answer: B Explanation: The Capacity Scheduler enables hierarchical queues, each with guaranteed capacity percentages, useful for multi‑tenant environments. Question 5. In Cloudera Manager, which wizard step configures external services such as the Hive Metastore database? A) Host Discovery B) Service Configuration C) Add Service D) Review Answer: C
A) Windows Server 2019 B) Red Hat Enterprise Linux 8 C) macOS Catalina D) Ubuntu Server 20.04 LTS Answer: B Explanation: RHEL 8 (or compatible CentOS) is the officially supported OS for CDP Private Cloud Base management nodes, ensuring stability and support. Question 9. Which container orchestration platform does CDP leverage for service deployment in the public cloud? A) Docker Swarm B) Apache Mesos C) Kubernetes D) OpenShift Answer: C Explanation: CDP uses Kubernetes to orchestrate containerized services, enabling elasticity and cloud‑native deployment models. Question 10. In Spark, which abstraction provides the most optimized execution plan for SQL‑like operations? A) RDD B) DataFrame C) Broadcast Variable
D) Accumulator Answer: B Explanation: DataFrames are backed by Catalyst optimizer and Tungsten execution engine, delivering automatic query optimization and efficient memory usage. Question 11. Which Spark API is primarily used to read a Parquet file from HDFS into a DataFrame? A) spark.read.text() B) spark.read.parquet() C) spark.read.json() D) spark.read.csv() Answer: B Explanation: spark.read.parquet() directly reads Parquet files, preserving schema and columnar storage benefits. Question 12. When running a Spark application on YARN, which mode launches the driver inside the ApplicationMaster? A) client mode B) cluster mode C) local mode D) standalone mode Answer: B
B) Sensor C) DAG D) Hook Answer: C Explanation: A DAG (Directed Acyclic Graph) specifies task dependencies, execution order, and scheduling in Airflow. Question 16. Which file format offers schema evolution and columnar storage, making it ideal for analytical workloads? A) CSV B) JSON C) Parquet D) Avro Answer: C Explanation: Parquet stores data column‑wise, supports schema evolution, and provides efficient compression, suited for analytics. Question 17. When partitioning a Hive table on a date column, which benefit is most directly realized? A) Faster writes due to smaller files B) Reduced storage cost via compression C) Query pruning that scans only relevant partitions D) Automatic indexing of the date column
Answer: C Explanation: Partition pruning allows queries to read only the partitions matching the date predicate, dramatically reducing I/O. Question 18. Which Iceberg feature enables time‑travel queries to view data as of a previous snapshot? A) Hidden partitions B) Snapshot isolation C) Row-level deletes D) Schema enforcement Answer: B Explanation: Iceberg maintains immutable snapshots; queries can specify a snapshot ID to retrieve data as it existed at that point. Question 19. In Spark, which configuration property controls the amount of memory allocated to the executor JVM? A) spark.driver.memory B) spark.executor.cores C) spark.executor.memory D) spark.memory.fraction Answer: C Explanation: spark.executor.memory sets the heap size for each executor process.
C) hdfs dfs - snapCreate D) hdfs dfs - snapshot Answer: A Explanation: The - mkdirSnapshot option creates a point‑in‑time snapshot of a directory, enabling fast rollback and backup. Question 23. In Cloudera Manager, which health test checks for out‑of‑memory (OOM) events on a host? A) Disk Usage B) Service Status C) JVM Heap Utilization D) Process Memory Answer: D Explanation: The Process Memory health test monitors OS‑level memory usage and flags OOM conditions. Question 24. Which of the following is NOT a valid replication method in Replication Manager? A) Snapshot replication B) Incremental replication C) Log‑based replication D) Full dump‑and‑load Answer: A
Explanation: Replication Manager supports incremental (log‑based) and full dump‑and‑load; snapshot replication is not a distinct method. Question 25. Which Ranger component stores policy definitions and provides the authorization engine? A) Policy Admin Server B) Tag Sync Service C) Plugin D) KMS Answer: A Explanation: The Policy Admin Server is the central repository for policies and performs authorization checks for requests. Question 26. In Apache Ranger, which audit format is commonly used to capture access events? A) JSON logs B) Avro files C) CSV files D) Parquet tables Answer: C Explanation: Ranger’s default audit logs are written in CSV format, making them easy to ingest into Hive or Impala for analysis. Question 27. Which Atlas feature tracks the lineage of a dataset as it moves through different services?
D) View Answer: B Explanation: Managed tables have their data and metadata fully controlled by Hive; dropping the table deletes the underlying files. Question 30. In Impala, which command shows the current query execution plan? A) EXPLAIN SELECT … B) DESCRIBE FORMATTED … C) SHOW PLAN … D) PROFILE SELECT … Answer: A Explanation: EXPLAIN prints the logical and physical plan for a query, helping identify performance bottlenecks. Question 31. Which Hive feature introduced ACID (Atomicity, Consistency, Isolation, Durability) support in version 3? A) ORC file format B) Transactional tables C) Bucketing D) Partition pruning Answer: B
Explanation: Transactional tables enable INSERT, UPDATE, DELETE, and MERGE operations with full ACID guarantees. Question 32. Which SQL function can be used in Impala to compute a running total over ordered rows? A) SUM() OVER (ORDER BY …) B) CUME_DIST() C) RANK() D) LAG() Answer: A Explanation: Window functions like SUM() OVER (ORDER BY …) calculate cumulative aggregates across a defined window. Question 33. In CDW (Cloudera Data Warehouse), what is a “virtual warehouse”? A) A logical grouping of compute resources for query execution B) A replicated HDFS namespace C) A container for storing raw data files D) A metadata repository Answer: A Explanation: Virtual warehouses allocate dedicated compute clusters for isolated query workloads, enabling multi‑tenant performance isolation. Question 34. Which CDW feature allows users to discover tables and columns across the enterprise?
Answer: B Explanation: SSL/TLS encrypts traffic between CDP services, ensuring confidentiality and integrity of data in motion. Question 37. Which component provides transparent encryption for HDFS files at rest? A) Ranger KMS B) HDFS Transparent Encryption (TDE) C) S3 SSE‑KMS D) Ozone encryption Answer: B Explanation: HDFS Transparent Encryption encrypts data blocks on disk, with keys managed by a KMS, providing at‑rest protection. Question 38. In CDP Public Cloud, which identity provider integration enables single sign‑on (SSO) for users? A) LDAP B) SAML 2. C) Kerberos D) OAuth 1. Answer: B Explanation: SAML 2.0 is the standard protocol for SSO integration with cloud identity providers such as Okta or Azure AD.
Question 39. Which Ranger policy type allows permissions to be applied based on resource tags rather than explicit resource names? A) Row‑level policy B) Column‑level policy C) Tag‑based policy D) Masking policy Answer: C Explanation: Tag‑based policies grant access to any resource that carries a specific tag, simplifying permission management. Question 40. Which of the following statements about Apache Ozone is true? A) Ozone replaces HDFS for batch processing only. B) Ozone provides object‑store semantics with a hierarchical namespace. C) Ozone is a fully managed service in CDP Public Cloud. D) Ozone does not support erasure coding. Answer: B Explanation: Ozone offers an object‑store like architecture with a hierarchical namespace, supporting both file‑ and object‑level operations. Question 41. Which YARN scheduler setting controls the maximum number of containers a user can launch simultaneously? A) yarn.scheduler.maximum-allocation-mb
Answer: C Explanation: NiFi processors typically define “Success” and “Failure” relationships to route FlowFiles based on processing outcome. Question 44. Which Kafka consumer configuration enables automatic offset commits after each poll? A) enable.auto.commit = false B) enable.auto.commit = true C) auto.offset.reset = earliest D) max.poll.records = 500 Answer: B Explanation: Setting enable.auto.commit to true makes the consumer commit offsets automatically after each poll interval. Question 45. Which Airflow operator is specifically designed to submit a Spark job on YARN? A) BashOperator B) PythonOperator C) SparkSubmitOperator D) HiveOperator Answer: C Explanation: SparkSubmitOperator wraps the spark-submit command, allowing Spark jobs to be launched from Airflow DAGs.
Question 46. Which file format is best suited for streaming writes with schema evolution support? A) Parquet B) ORC C) Avro D) CSV Answer: C Explanation: Avro’s row‑based storage and embedded schema make it ideal for streaming ingestion where schema may evolve. Question 47. Which Hive setting enables vectorized query execution for faster processing? A) hive.exec.dynamic.partition B) hive.vectorized.execution.enabled C) hive.mapred.mode D) hive.exec.orc.default.compress Answer: B Explanation: hive.vectorized.execution.enabled toggles vectorized processing, allowing batch operations on column vectors. Question 48. Which Impala configuration parameter controls the amount of memory allocated per query? A) impala_memory_limit B) impala_query_memory_limit C) impala_server_memory_limit