





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
databricks quiz for data engineering professional exam
Typology: Quizzes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






Domain 1: Data Processing & Optimization Focus: Shuffle partitions, join strategies, and caching. 1 When using spark.sql.shuffle.partitions, what is the primary risk of setting this value too high for a small dataset? A) Data skew. B) Out of Memory (OOM) errors. C) Excessive overhead from task scheduling and small file I/O. D) Spill to disk. 2 In a Sort-Merge Join, if one table has a significant null-key count, what is the most likely performance bottleneck? A) Network I/O. B) Data Skew on a single executor. C) Analysis exception. D) Disk serialization. 3 Which operation triggers a full shuffle in Spark? A) filter() B) map()
C) repartition() D) coalesce() (when reducing partitions) 4 Which hint should be used when joining a 10MB table with a 10TB table to avoid a Shuffle Hash Join? A) /*+ MERGE(t1) / B) /+ BROADCAST(t1) / C) /+ REPARTITION(10) / D) /+ SKEW(t1) */ 5 What is the effect of the "Adaptive Query Execution" (AQE) feature skewJoin.enabled? A) It automatically filters out nulls. B) It splits skewed partitions into smaller sub-partitions to balance the load. C) It increases the executor memory dynamically. D) It converts a Sort-Merge join to a Broadcast Join at runtime. Domain 2: Delta Lake Architecture Focus: Transaction logs, VACUUM, and Z-Ordering. 6 How does Delta Lake ensure Atomicity? A) By locking the entire storage account.
B) It allows multi-dimensional data clustering to improve data skipping. C) It encrypts the data at rest. D) It reduces the number of files to exactly one per partition. 10 In a streaming Delta Sink, what does trigger(availableNow=True) do? A) Processes only the first file and stops. B) Processes all available data in micro-batches and then shuts down. C) Runs a continuous loop every 10 seconds. D) It is an alias for trigger(once=True).Domain 3: Data Modeling & Change Data Capture (CDC)
Focus: SCD Type 2, MERGE INTO, and Watermarking. 11 When implementing SCD Type 2, why is a MERGE statement preferred over a series of INSERT/UPDATE statements? A) It is the only way to write to Delta. B) It provides an atomic operation to handle matches and non-matches simultaneously. C) It automatically manages the is_current flag without logic. D) It doesn't require a join key.
12 You are using Change Data Feed (CDF). Which virtual column identifies the type of change (e.g., insert, update_preimage)? A) _change_metadata B) _change_type C) _commit_version D) _operation 13 What is the purpose of "Watermarking" in Structured Streaming? A) To encrypt the stream. B) To handle late-arriving data and clean up state store. C) To limit the number of files read per batch. D) To trigger a notification when data is missing. 14 When using foreachBatch, what is a critical consideration for maintaining idempotency? A) Always use overwrite mode. B) Use the batchId to ensure the same data isn't processed twice in a restart. C) Avoid using SQL inside the function. D) Set shuffle partitions to 1.
18 What is a "Dynamic View" used for in Databricks? A) Speeding up queries using caching. B) Column-level or Row-level security based on the current user's identity. C) Automatically refreshing data every 5 minutes. D) Converting Parquet to Delta. 19 How do you define a system-wide "External Location" in Unity Catalog? A) Via a dbutils.fs.mount command. B) By creating a Storage Credential and an External Location object. C) By hardcoding S3 keys in the cluster config. D) By using a .netrc file. 20 What is a "Managed Table" in Unity Catalog? A) A table where the user manages the underlying file location. B) A table where Databricks manages both the metadata and the physical data files. C) A table that cannot be deleted.
D) A view that is cached in RAM.