databricks quiz for data engineering professional exam, Quizzes of Computer science

databricks quiz for data engineering professional exam

Typology: Quizzes

2024/2025

Uploaded on 01/01/2026

bhaskar-namile
bhaskar-namile 🇮🇳

1 document

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Domain 1: Data Processing & Optimization
Focus: Shuffle partitions, join strategies, and caching.
1 When using spark.sql.shuffle.partitions, what is the primary risk of setting this value too high for a small dataset?
A) Data skew.
B) Out of Memory (OOM) errors.
C) Excessive overhead from task scheduling and small file I/O.
D) Spill to disk.
2 In a Sort-Merge Join, if one table has a significant null-key count, what is the most likely performance bottleneck?
A) Network I/O.
B) Data Skew on a single executor.
C) Analysis exception.
D) Disk serialization.
3 Which operation triggers a full shuffle in Spark?
A) filter()
B) map()
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download databricks quiz for data engineering professional exam and more Quizzes Computer science in PDF only on Docsity!

Domain 1: Data Processing & Optimization Focus: Shuffle partitions, join strategies, and caching. 1 When using spark.sql.shuffle.partitions, what is the primary risk of setting this value too high for a small dataset? A) Data skew. B) Out of Memory (OOM) errors. C) Excessive overhead from task scheduling and small file I/O. D) Spill to disk. 2 In a Sort-Merge Join, if one table has a significant null-key count, what is the most likely performance bottleneck? A) Network I/O. B) Data Skew on a single executor. C) Analysis exception. D) Disk serialization. 3 Which operation triggers a full shuffle in Spark? A) filter() B) map()

C) repartition() D) coalesce() (when reducing partitions) 4 Which hint should be used when joining a 10MB table with a 10TB table to avoid a Shuffle Hash Join? A) /*+ MERGE(t1) / B) /+ BROADCAST(t1) / C) /+ REPARTITION(10) / D) /+ SKEW(t1) */ 5 What is the effect of the "Adaptive Query Execution" (AQE) feature skewJoin.enabled? A) It automatically filters out nulls. B) It splits skewed partitions into smaller sub-partitions to balance the load. C) It increases the executor memory dynamically. D) It converts a Sort-Merge join to a Broadcast Join at runtime. Domain 2: Delta Lake Architecture Focus: Transaction logs, VACUUM, and Z-Ordering. 6 How does Delta Lake ensure Atomicity? A) By locking the entire storage account.

B) It allows multi-dimensional data clustering to improve data skipping. C) It encrypts the data at rest. D) It reduces the number of files to exactly one per partition. 10 In a streaming Delta Sink, what does trigger(availableNow=True) do? A) Processes only the first file and stops. B) Processes all available data in micro-batches and then shuts down. C) Runs a continuous loop every 10 seconds. D) It is an alias for trigger(once=True).Domain 3: Data Modeling & Change Data Capture (CDC)

Focus: SCD Type 2, MERGE INTO, and Watermarking. 11 When implementing SCD Type 2, why is a MERGE statement preferred over a series of INSERT/UPDATE statements? A) It is the only way to write to Delta. B) It provides an atomic operation to handle matches and non-matches simultaneously. C) It automatically manages the is_current flag without logic. D) It doesn't require a join key.

12 You are using Change Data Feed (CDF). Which virtual column identifies the type of change (e.g., insert, update_preimage)? A) _change_metadata B) _change_type C) _commit_version D) _operation 13 What is the purpose of "Watermarking" in Structured Streaming? A) To encrypt the stream. B) To handle late-arriving data and clean up state store. C) To limit the number of files read per batch. D) To trigger a notification when data is missing. 14 When using foreachBatch, what is a critical consideration for maintaining idempotency? A) Always use overwrite mode. B) Use the batchId to ensure the same data isn't processed twice in a restart. C) Avoid using SQL inside the function. D) Set shuffle partitions to 1.

D) GRANT USAGE ON TABLE ...

18 What is a "Dynamic View" used for in Databricks? A) Speeding up queries using caching. B) Column-level or Row-level security based on the current user's identity. C) Automatically refreshing data every 5 minutes. D) Converting Parquet to Delta. 19 How do you define a system-wide "External Location" in Unity Catalog? A) Via a dbutils.fs.mount command. B) By creating a Storage Credential and an External Location object. C) By hardcoding S3 keys in the cluster config. D) By using a .netrc file. 20 What is a "Managed Table" in Unity Catalog? A) A table where the user manages the underlying file location. B) A table where Databricks manages both the metadata and the physical data files. C) A table that cannot be deleted.

D) A view that is cached in RAM.