Cloudera Cloudera CDP Certification Program Practice Exam, Exams of Technology

A complete practice exam aligned with the Cloudera Data Platform (CDP) certification paths. It evaluates multi-domain knowledge: data engineering, administration, operational workflows, cloud-native analytics, security configurations, and SDX governance across public and private cloud deployments.

Typology: Exams

2025/2026

Available from 01/06/2026

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 103

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Cloudera Cloudera CDP Certification Program
Practice Exam
**Question 1. Which CDP deployment model provides a fully managed service on public cloud providers
such as AWS, Azure, and GCP?**
A) CDP Private Cloud Base
B) CDP Public Cloud
C) CDP OnPremises Data Hub
D) CDP Edge
Answer: B
Explanation: CDP Public Cloud delivers a fully managed, SaaSstyle data platform on public cloud
infrastructures, handling provisioning, scaling, and maintenance for the user.
**Question 2. In CDP architecture, which component is responsible for unified security, governance, and
metadata across all environments?**
A) Cloudera Manager
B) Workload XM
C) Shared Data Experience (SDX)
D) Replication Manager
Answer: C
Explanation: SDX provides consistent security policies, data cataloging, and lineage across data lake,
data warehouse, and other services, ensuring a unified governance layer.
**Question 3. What is the primary role of the NameNode in HDFS?**
A) Store actual block data
B) Manage metadata and namespace operations
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Cloudera Cloudera CDP Certification Program Practice Exam and more Exams Technology in PDF only on Docsity!

Practice Exam

Question 1. Which CDP deployment model provides a fully managed service on public cloud providers such as AWS, Azure, and GCP? A) CDP Private Cloud Base B) CDP Public Cloud C) CDP On‑Premises Data Hub D) CDP Edge Answer: B Explanation: CDP Public Cloud delivers a fully managed, SaaS‑style data platform on public cloud infrastructures, handling provisioning, scaling, and maintenance for the user. Question 2. In CDP architecture, which component is responsible for unified security, governance, and metadata across all environments? A) Cloudera Manager B) Workload XM C) Shared Data Experience (SDX) D) Replication Manager Answer: C Explanation: SDX provides consistent security policies, data cataloging, and lineage across data lake, data warehouse, and other services, ensuring a unified governance layer. Question 3. What is the primary role of the NameNode in HDFS? A) Store actual block data B) Manage metadata and namespace operations

Practice Exam

C) Schedule MapReduce jobs D) Perform data compression Answer: B Explanation: The NameNode maintains the file system namespace, storing metadata such as file‑to‑block mappings, permissions, and directory structures. Question 4. Which YARN scheduler allows multiple queues with hierarchical capacity guarantees? A) Fair Scheduler B) Capacity Scheduler C) FIFO Scheduler D) Dominant Resource Fairness Answer: B Explanation: The Capacity Scheduler enables hierarchical queues, each with guaranteed capacity percentages, useful for multi‑tenant environments. Question 5. In Cloudera Manager, which wizard step configures external services such as the Hive Metastore database? A) Host Discovery B) Service Configuration C) Add Service D) Review Answer: C

Practice Exam

A) Windows Server 2019 B) Red Hat Enterprise Linux 8 C) macOS Catalina D) Ubuntu Server 20.04 LTS Answer: B Explanation: RHEL 8 (or compatible CentOS) is the officially supported OS for CDP Private Cloud Base management nodes, ensuring stability and support. Question 9. Which container orchestration platform does CDP leverage for service deployment in the public cloud? A) Docker Swarm B) Apache Mesos C) Kubernetes D) OpenShift Answer: C Explanation: CDP uses Kubernetes to orchestrate containerized services, enabling elasticity and cloud‑native deployment models. Question 10. In Spark, which abstraction provides the most optimized execution plan for SQL‑like operations? A) RDD B) DataFrame C) Broadcast Variable

Practice Exam

D) Accumulator Answer: B Explanation: DataFrames are backed by Catalyst optimizer and Tungsten execution engine, delivering automatic query optimization and efficient memory usage. Question 11. Which Spark API is primarily used to read a Parquet file from HDFS into a DataFrame? A) spark.read.text() B) spark.read.parquet() C) spark.read.json() D) spark.read.csv() Answer: B Explanation: spark.read.parquet() directly reads Parquet files, preserving schema and columnar storage benefits. Question 12. When running a Spark application on YARN, which mode launches the driver inside the ApplicationMaster? A) client mode B) cluster mode C) local mode D) standalone mode Answer: B

Practice Exam

B) Sensor C) DAG D) Hook Answer: C Explanation: A DAG (Directed Acyclic Graph) specifies task dependencies, execution order, and scheduling in Airflow. Question 16. Which file format offers schema evolution and columnar storage, making it ideal for analytical workloads? A) CSV B) JSON C) Parquet D) Avro Answer: C Explanation: Parquet stores data column‑wise, supports schema evolution, and provides efficient compression, suited for analytics. Question 17. When partitioning a Hive table on a date column, which benefit is most directly realized? A) Faster writes due to smaller files B) Reduced storage cost via compression C) Query pruning that scans only relevant partitions D) Automatic indexing of the date column

Practice Exam

Answer: C Explanation: Partition pruning allows queries to read only the partitions matching the date predicate, dramatically reducing I/O. Question 18. Which Iceberg feature enables time‑travel queries to view data as of a previous snapshot? A) Hidden partitions B) Snapshot isolation C) Row-level deletes D) Schema enforcement Answer: B Explanation: Iceberg maintains immutable snapshots; queries can specify a snapshot ID to retrieve data as it existed at that point. Question 19. In Spark, which configuration property controls the amount of memory allocated to the executor JVM? A) spark.driver.memory B) spark.executor.cores C) spark.executor.memory D) spark.memory.fraction Answer: C Explanation: spark.executor.memory sets the heap size for each executor process.

Practice Exam

C) hdfs dfs - snapCreate D) hdfs dfs - snapshot Answer: A Explanation: The - mkdirSnapshot option creates a point‑in‑time snapshot of a directory, enabling fast rollback and backup. Question 23. In Cloudera Manager, which health test checks for out‑of‑memory (OOM) events on a host? A) Disk Usage B) Service Status C) JVM Heap Utilization D) Process Memory Answer: D Explanation: The Process Memory health test monitors OS‑level memory usage and flags OOM conditions. Question 24. Which of the following is NOT a valid replication method in Replication Manager? A) Snapshot replication B) Incremental replication C) Log‑based replication D) Full dump‑and‑load Answer: A

Practice Exam

Explanation: Replication Manager supports incremental (log‑based) and full dump‑and‑load; snapshot replication is not a distinct method. Question 25. Which Ranger component stores policy definitions and provides the authorization engine? A) Policy Admin Server B) Tag Sync Service C) Plugin D) KMS Answer: A Explanation: The Policy Admin Server is the central repository for policies and performs authorization checks for requests. Question 26. In Apache Ranger, which audit format is commonly used to capture access events? A) JSON logs B) Avro files C) CSV files D) Parquet tables Answer: C Explanation: Ranger’s default audit logs are written in CSV format, making them easy to ingest into Hive or Impala for analysis. Question 27. Which Atlas feature tracks the lineage of a dataset as it moves through different services?

Practice Exam

D) View Answer: B Explanation: Managed tables have their data and metadata fully controlled by Hive; dropping the table deletes the underlying files. Question 30. In Impala, which command shows the current query execution plan? A) EXPLAIN SELECT … B) DESCRIBE FORMATTED … C) SHOW PLAN … D) PROFILE SELECT … Answer: A Explanation: EXPLAIN prints the logical and physical plan for a query, helping identify performance bottlenecks. Question 31. Which Hive feature introduced ACID (Atomicity, Consistency, Isolation, Durability) support in version 3? A) ORC file format B) Transactional tables C) Bucketing D) Partition pruning Answer: B

Practice Exam

Explanation: Transactional tables enable INSERT, UPDATE, DELETE, and MERGE operations with full ACID guarantees. Question 32. Which SQL function can be used in Impala to compute a running total over ordered rows? A) SUM() OVER (ORDER BY …) B) CUME_DIST() C) RANK() D) LAG() Answer: A Explanation: Window functions like SUM() OVER (ORDER BY …) calculate cumulative aggregates across a defined window. Question 33. In CDW (Cloudera Data Warehouse), what is a “virtual warehouse”? A) A logical grouping of compute resources for query execution B) A replicated HDFS namespace C) A container for storing raw data files D) A metadata repository Answer: A Explanation: Virtual warehouses allocate dedicated compute clusters for isolated query workloads, enabling multi‑tenant performance isolation. Question 34. Which CDW feature allows users to discover tables and columns across the enterprise?

Practice Exam

Answer: B Explanation: SSL/TLS encrypts traffic between CDP services, ensuring confidentiality and integrity of data in motion. Question 37. Which component provides transparent encryption for HDFS files at rest? A) Ranger KMS B) HDFS Transparent Encryption (TDE) C) S3 SSE‑KMS D) Ozone encryption Answer: B Explanation: HDFS Transparent Encryption encrypts data blocks on disk, with keys managed by a KMS, providing at‑rest protection. Question 38. In CDP Public Cloud, which identity provider integration enables single sign‑on (SSO) for users? A) LDAP B) SAML 2. C) Kerberos D) OAuth 1. Answer: B Explanation: SAML 2.0 is the standard protocol for SSO integration with cloud identity providers such as Okta or Azure AD.

Practice Exam

Question 39. Which Ranger policy type allows permissions to be applied based on resource tags rather than explicit resource names? A) Row‑level policy B) Column‑level policy C) Tag‑based policy D) Masking policy Answer: C Explanation: Tag‑based policies grant access to any resource that carries a specific tag, simplifying permission management. Question 40. Which of the following statements about Apache Ozone is true? A) Ozone replaces HDFS for batch processing only. B) Ozone provides object‑store semantics with a hierarchical namespace. C) Ozone is a fully managed service in CDP Public Cloud. D) Ozone does not support erasure coding. Answer: B Explanation: Ozone offers an object‑store like architecture with a hierarchical namespace, supporting both file‑ and object‑level operations. Question 41. Which YARN scheduler setting controls the maximum number of containers a user can launch simultaneously? A) yarn.scheduler.maximum-allocation-mb

Practice Exam

Answer: C Explanation: NiFi processors typically define “Success” and “Failure” relationships to route FlowFiles based on processing outcome. Question 44. Which Kafka consumer configuration enables automatic offset commits after each poll? A) enable.auto.commit = false B) enable.auto.commit = true C) auto.offset.reset = earliest D) max.poll.records = 500 Answer: B Explanation: Setting enable.auto.commit to true makes the consumer commit offsets automatically after each poll interval. Question 45. Which Airflow operator is specifically designed to submit a Spark job on YARN? A) BashOperator B) PythonOperator C) SparkSubmitOperator D) HiveOperator Answer: C Explanation: SparkSubmitOperator wraps the spark-submit command, allowing Spark jobs to be launched from Airflow DAGs.

Practice Exam

Question 46. Which file format is best suited for streaming writes with schema evolution support? A) Parquet B) ORC C) Avro D) CSV Answer: C Explanation: Avro’s row‑based storage and embedded schema make it ideal for streaming ingestion where schema may evolve. Question 47. Which Hive setting enables vectorized query execution for faster processing? A) hive.exec.dynamic.partition B) hive.vectorized.execution.enabled C) hive.mapred.mode D) hive.exec.orc.default.compress Answer: B Explanation: hive.vectorized.execution.enabled toggles vectorized processing, allowing batch operations on column vectors. Question 48. Which Impala configuration parameter controls the amount of memory allocated per query? A) impala_memory_limit B) impala_query_memory_limit C) impala_server_memory_limit