Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

databricks quiz for data engineering professional exam, Quizzes of Computer science

Computer science

databricks quiz for data engineering professional exam

Typology: Quizzes

2024/2025

Uploaded on 01/01/2026

bhaskar-namile 🇮🇳

1 document

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Domain 1: Data Processing & Optimization

Focus: Shuffle partitions, join strategies, and caching.

1 When using spark.sql.shuffle.partitions, what is the primary risk of setting this value too high for a small dataset?

A) Data skew.

B) Out of Memory (OOM) errors.

C) Excessive overhead from task scheduling and small file I/O.

D) Spill to disk.

2 In a Sort-Merge Join, if one table has a significant null-key count, what is the most likely performance bottleneck?

A) Network I/O.

B) Data Skew on a single executor.

C) Analysis exception.

D) Disk serialization.

3 Which operation triggers a full shuffle in Spark?

A) filter()

B) map()

Partial preview of the text

Download databricks quiz for data engineering professional exam and more Quizzes Computer science in PDF only on Docsity!

Domain 1: Data Processing & Optimization Focus: Shuffle partitions, join strategies, and caching. 1 When using spark.sql.shuffle.partitions, what is the primary risk of setting this value too high for a small dataset? A) Data skew. B) Out of Memory (OOM) errors. C) Excessive overhead from task scheduling and small file I/O. D) Spill to disk. 2 In a Sort-Merge Join, if one table has a significant null-key count, what is the most likely performance bottleneck? A) Network I/O. B) Data Skew on a single executor. C) Analysis exception. D) Disk serialization. 3 Which operation triggers a full shuffle in Spark? A) filter() B) map()

C) repartition() D) coalesce() (when reducing partitions) 4 Which hint should be used when joining a 10MB table with a 10TB table to avoid a Shuffle Hash Join? A) /*+ MERGE(t1) / B) /+ BROADCAST(t1) / C) /+ REPARTITION(10) / D) /+ SKEW(t1) */ 5 What is the effect of the "Adaptive Query Execution" (AQE) feature skewJoin.enabled? A) It automatically filters out nulls. B) It splits skewed partitions into smaller sub-partitions to balance the load. C) It increases the executor memory dynamically. D) It converts a Sort-Merge join to a Broadcast Join at runtime. Domain 2: Delta Lake Architecture Focus: Transaction logs, VACUUM, and Z-Ordering. 6 How does Delta Lake ensure Atomicity? A) By locking the entire storage account.

B) It allows multi-dimensional data clustering to improve data skipping. C) It encrypts the data at rest. D) It reduces the number of files to exactly one per partition. 10 In a streaming Delta Sink, what does trigger(availableNow=True) do? A) Processes only the first file and stops. B) Processes all available data in micro-batches and then shuts down. C) Runs a continuous loop every 10 seconds. D) It is an alias for trigger(once=True).Domain 3: Data Modeling & Change Data Capture (CDC)

Focus: SCD Type 2, MERGE INTO, and Watermarking. 11 When implementing SCD Type 2, why is a MERGE statement preferred over a series of INSERT/UPDATE statements? A) It is the only way to write to Delta. B) It provides an atomic operation to handle matches and non-matches simultaneously. C) It automatically manages the is_current flag without logic. D) It doesn't require a join key.

12 You are using Change Data Feed (CDF). Which virtual column identifies the type of change (e.g., insert, update_preimage)? A) _change_metadata B) _change_type C) _commit_version D) _operation 13 What is the purpose of "Watermarking" in Structured Streaming? A) To encrypt the stream. B) To handle late-arriving data and clean up state store. C) To limit the number of files read per batch. D) To trigger a notification when data is missing. 14 When using foreachBatch, what is a critical consideration for maintaining idempotency? A) Always use overwrite mode. B) Use the batchId to ensure the same data isn't processed twice in a restart. C) Avoid using SQL inside the function. D) Set shuffle partitions to 1.

D) GRANT USAGE ON TABLE ...

18 What is a "Dynamic View" used for in Databricks? A) Speeding up queries using caching. B) Column-level or Row-level security based on the current user's identity. C) Automatically refreshing data every 5 minutes. D) Converting Parquet to Delta. 19 How do you define a system-wide "External Location" in Unity Catalog? A) Via a dbutils.fs.mount command. B) By creating a Storage Credential and an External Location object. C) By hardcoding S3 keys in the cluster config. D) By using a .netrc file. 20 What is a "Managed Table" in Unity Catalog? A) A table where the user manages the underlying file location. B) A table where Databricks manages both the metadata and the physical data files. C) A table that cannot be deleted.

D) A view that is cached in RAM.

databricks quiz for data engineering professional exam, Quizzes of Computer science

Related documents

Partial preview of the text

Download databricks quiz for data engineering professional exam and more Quizzes Computer science in PDF only on Docsity!

D) GRANT USAGE ON TABLE ...