Data Engineering Associate with Databricks Questions with Correct Answers, Exams of Advanced Education

Data Engineering Associate with Databricks Questions with Correct Answers

Typology: Exams

2025/2026

Available from 06/01/2026

kamothojj
kamothojj 🇺🇸

5

(2)

16K documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Engineering Associate with
Databricks
What is done with the control plane? - answer web application
notebooks
jobs
cluster management
What is done with the data plane? - answer spinning up clusters
data processing
accessing data sources
SQL: Create table with specific columns - answer CREATE TABLE students (id INT,
name STRING)
SQL: Insert data - answerINSERT INTO students
VALUES (1, "Omar")
Does ACID guarantee inserting many records in a single transaction? - answerYes
SQL: Update record - answerUPDATE students
SET value = value + 1
WHERE name LIKE "T%" // starts with T
SQL: Delete record - answerDELETE FROM students
WHERE value > 6
// Boolean is lowercase
// equals is single =
What is upsert? - answerAllows updates, inserts and other manipulations in a single
command
SQL: Merge for CDC - answerMERGE INTO students b
USING updates u
ON b.id=u.id
WHEN MATCHED AND u.type = "update"
THEN UPDATE SET *
THEN INSERT *
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Data Engineering Associate with Databricks Questions with Correct Answers and more Exams Advanced Education in PDF only on Docsity!

Data Engineering Associate with

Databricks

What is done with the control plane? - answer web application notebooks jobs cluster management What is done with the data plane? - answer spinning up clusters data processing accessing data sources SQL: Create table with specific columns - answer CREATE TABLE students (id INT, name STRING) SQL: Insert data - answerINSERT INTO students VALUES (1, "Omar") Does ACID guarantee inserting many records in a single transaction? - answerYes SQL: Update record - answerUPDATE students SET value = value + 1 WHERE name LIKE "T%" // starts with T SQL: Delete record - answerDELETE FROM students WHERE value > 6 // Boolean is lowercase // equals is single = What is upsert? - answerAllows updates, inserts and other manipulations in a single command SQL: Merge for CDC - answerMERGE INTO students b USING updates u ON b.id=u.id WHEN MATCHED AND u.type = "update" THEN UPDATE SET * THEN INSERT *

WHEN NOT MATCHED AND

THEN DELETE

Which metastore does databricks use by default? - answerHive SQL: See details about Delta table including number of files - answerDESCRIBE DETAIL students What does OPTIMIZE do? - answerReplace existing data files by combining files and rewriting results SQL: OPTIMIZE - answerOPTIMIZE students ZORDER BY id What are the configs for vacuuming? - answerSET spark.databricks.delta.retentionDurationCheck.enabled = false; SET spark.databricks.delta.vacuum.logging.enabled = true; What does ZORDER do? - answerspeeds up data retrieval when filling on provided fields by colocating data SQL: View history - answerDESCRIBE HISTORY students SQL: Time travel - answerSELECT * FROM students VERSION AS OF 3 SQL: Restore table - answerRESTORE TABLE students TO VERSION AS OF 8 What is the default time to vacuum files? - answer7 days SQL: Vacuum - answerVACUUM students RETAIN 0 HOURS What does DRY RUN do with VACUUM? - answerDisplays all files that will be deleted What does the LOCATION tag do? - answerIf you set it, you can define a location for the data to reside, otherwise it saves in /user/hive/warehouse/ SQL: How do you specify a command to run in a database - answerUSE dbname; CREATE OR REPLACE TABLE SQL: How do you describe the table? - answerDESCRIBE TABLE EXTENDED DESCRIBE EXTENDED

USING JDBC

OPTIONS (

URL

Dbtable User Password ) When should you use CTAS? - answerAutomatically infer schema Useful for external data ingestion from sources with well-defined schemas SQL: Generated columns - answerCREATE TABLE purchase_dates ( Id STRING transcation_timestamp STRING, Price STRING, Date DATE GENERATED ALWAYS AS ( Cast(cast(transaction_timestamp/1e6 AS TIMESTAMP) AS DATE)) Comment "generated based on trans_to columns" How can constraints be used? - answerDelta Lake enforces schema on write, Databricks can support standard SQL constraint management clauses to ensure the quality and integrity of data added to the table. Shown in table properties of describe extended SQL: Constraints - answerALTER TABLE purchase_dates ADD CONSTRAINT valid_date CHECK (date > '2020-01-01') What's the difference between DEEP CLONE and SHALLOW CLONE? - answerDEEP CLONE fully copies the data and metadata from source to target. It occurs incrementally, so executing this command again can sync changes. SHALLOW CLONE copies the Delta transaction logs, meaning the data doesn't move SQL: Clone - answerCREATE TABLE name DEEP CLONE source CREATE TABLE name SHALLOW CLONE source What are the writing to Delta functions? - answerOverwrite - OVERWRITE, INSERT OVERWRITE

  • much faster, doesn't need to list recursively and delete
  • atomic, can still read table while deleting
  • ACID guarantees, can fallback
  • can use Delta time travel
  • INSERT OVERWRITE will fail if the schema changes

Append - INSERT INTO

  • incremental updates to existing tables, more efficient than full overwrite
  • no guarantees on inserting duplicate records Append, update and delete - MERGE INTO Merge - updates, inserts and delete are a single transaction, supports multiple conditionals
  • need to define when matched and when not matched
  • MERGE INTO only needs when not matched to avoid inserting duplicate records Ingest incrementally - COPY INTO
  • incrementally ingest data from other systems
  • potentially much cheaper SQL: MERGE INTO - answerMERGE INTO events a USING events1 b ON a.id = b.id and a.ts = b.ts WHEN NOT MATCHED AND b.traffic_source = 'email' THEN INSERT * SQL: Create json table - answerCREATE TABLE name (Key BINARY) USING JSON OPTIONS (path = "path") CREATE TABLE name AS SELECT * FROM json.path SQL: INSERT INTO - answerINSERT INTO events_raw Select * from events_json Does count skip NULL? - answerCount(col) skips NULL Count() counts NULL Count_if(user IS NULL) // counts nulls SQL: Distinct Rows and values - answerSELECT count(DISTINCT()) FROM table SELECT count(DISTINCT(id)) FROM table SQL: Dedup rows - answerCREATE VIEW name AS SELECT DISTINCT(*) FROM table; //does not omit row

PySpark: Writestream - answerSpark.table().writestream.option(checkpoint location).outputmode().trigger.table() What are the SQL functions for timestamp and file name? - answerCurrent_timestamp() Input_file_name() When should you use Auto Loader? - answerFor incremental ingestion SQL: What is the syntax for a streaming table? - answerCREATE OR REFRESH STREAMING LIVE TABLE name What's the difference between a data lake house, data warehouse and data lake? - answerStore data in an open data lake and get quality performance and security. ACID transactions, schema enforcement and evolution Enables BI, AI How is work processed across nodes in a cluster? - answerDriver node (teacher) assigns the work to executors (workers) Workers perform tasks. What is the limitation with Databricks Notebooks version vs Repos? - answerNotebooks version live with the notebook and are lost when the notebook is deleted or exported. What's the difference between a View and CTE? - answerView can be temporary or persist, can be accessed across multiple SQL statements. CTE can be accessed within a SQL statement What are the general concepts, triggers and watermarks for Structured Streaming? - answerWatermarks - dropping of old state data Triggers - time windows Express computations on streaming data What is the Auto Loader used for? - answerStreaming reads What are the benefits to Delta Live Tables? - answerSimple declarative approach for building data pipelines Manage structure and equality Simplify batch and streaming What are the benefits to Unity Catalog? - answerUnified governance across clouds Data and AI assets What are the Table ACL permissions? - answerCatalog, database, table, view, function, any file

What are the Table ACLs roles? - answerDatabricks admin, catalog owner, database owner, table owner What are the Table ACL privileges? - answerALL PRIVILEGES, SELECT, MODIFY, READ_METADATA, USAGE, CREATE How do you set permissions with Databricks SQL? - answerGRANT USAGE DATABASE name USER name What are examples of narrow transformations? - answerFilter, contains, map because they only operate on a single partition Examples of trigger parameter - answerSpark.table().writestream.optin(checkpoint).outputmode(complete).trigger(availa bleNow=True).table().awaittermination() What are the differences with the trigger parameter? - answerProcessingTime - user intervals Once AvailableNow - process all data available then stop What is auto loader preferred for? - answerReading data off of cloud storage What is the target field in DLT? - answerDatabase where the result tables will be published SQL: How to start a DLT? - answerCREATE INCREMENTAL LIVE TABLE Cloud_files() SQL: What do you add to tables when using a DLT? - answerLIVE LIVE.table FROM STREAM() Which of the following describes a benefit of a data lakehouse that is unavailable in a traditional data warehouse? - answerA data lakehouse enables both batch and streaming analytics. A data engineer has created a Delta table as part of a data pipeline. Downstream data analysts now need SELECT permission on the Delta table. Assuming the data engineer is the Delta table owner, which part of the Databricks Lakehouse Platform can the data engineer use to grant the data analysts the appropriate access? - answerdata explorer