Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Engineering Associate with Databricks Questions with Correct Answers, Exams of Advanced Education

Miami University - Oxford Advanced Education

Data Engineering Associate with Databricks Questions with Correct Answers

Typology: Exams

2025/2026

Available from 06/01/2026

kamothojj 🇺🇸

5

(2)

16K documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Data Engineering Associate with

Databricks

What is done with the control plane? - answer web application

notebooks

jobs

cluster management

What is done with the data plane? - answer spinning up clusters

data processing

accessing data sources

SQL: Create table with specific columns - answer CREATE TABLE students (id INT,

name STRING)

SQL: Insert data - answerINSERT INTO students

VALUES (1, "Omar")

Does ACID guarantee inserting many records in a single transaction? - answerYes

SQL: Update record - answerUPDATE students

SET value = value + 1

WHERE name LIKE "T%" // starts with T

SQL: Delete record - answerDELETE FROM students

WHERE value > 6

// Boolean is lowercase

// equals is single =

What is upsert? - answerAllows updates, inserts and other manipulations in a single

command

SQL: Merge for CDC - answerMERGE INTO students b

USING updates u

ON b.id=u.id

WHEN MATCHED AND u.type = "update"

THEN UPDATE SET *

THEN INSERT *

Discover Exams of Advanced Education Miami University - Oxford

Partial preview of the text

Download Data Engineering Associate with Databricks Questions with Correct Answers and more Exams Advanced Education in PDF only on Docsity!

Data Engineering Associate with

Databricks

What is done with the control plane? - answer web application notebooks jobs cluster management What is done with the data plane? - answer spinning up clusters data processing accessing data sources SQL: Create table with specific columns - answer CREATE TABLE students (id INT, name STRING) SQL: Insert data - answerINSERT INTO students VALUES (1, "Omar") Does ACID guarantee inserting many records in a single transaction? - answerYes SQL: Update record - answerUPDATE students SET value = value + 1 WHERE name LIKE "T%" // starts with T SQL: Delete record - answerDELETE FROM students WHERE value > 6 // Boolean is lowercase // equals is single = What is upsert? - answerAllows updates, inserts and other manipulations in a single command SQL: Merge for CDC - answerMERGE INTO students b USING updates u ON b.id=u.id WHEN MATCHED AND u.type = "update" THEN UPDATE SET * THEN INSERT *

WHEN NOT MATCHED AND

THEN DELETE

Which metastore does databricks use by default? - answerHive SQL: See details about Delta table including number of files - answerDESCRIBE DETAIL students What does OPTIMIZE do? - answerReplace existing data files by combining files and rewriting results SQL: OPTIMIZE - answerOPTIMIZE students ZORDER BY id What are the configs for vacuuming? - answerSET spark.databricks.delta.retentionDurationCheck.enabled = false; SET spark.databricks.delta.vacuum.logging.enabled = true; What does ZORDER do? - answerspeeds up data retrieval when filling on provided fields by colocating data SQL: View history - answerDESCRIBE HISTORY students SQL: Time travel - answerSELECT * FROM students VERSION AS OF 3 SQL: Restore table - answerRESTORE TABLE students TO VERSION AS OF 8 What is the default time to vacuum files? - answer7 days SQL: Vacuum - answerVACUUM students RETAIN 0 HOURS What does DRY RUN do with VACUUM? - answerDisplays all files that will be deleted What does the LOCATION tag do? - answerIf you set it, you can define a location for the data to reside, otherwise it saves in /user/hive/warehouse/ SQL: How do you specify a command to run in a database - answerUSE dbname; CREATE OR REPLACE TABLE SQL: How do you describe the table? - answerDESCRIBE TABLE EXTENDED DESCRIBE EXTENDED

USING JDBC

OPTIONS (

URL

Dbtable User Password ) When should you use CTAS? - answerAutomatically infer schema Useful for external data ingestion from sources with well-defined schemas SQL: Generated columns - answerCREATE TABLE purchase_dates ( Id STRING transcation_timestamp STRING, Price STRING, Date DATE GENERATED ALWAYS AS ( Cast(cast(transaction_timestamp/1e6 AS TIMESTAMP) AS DATE)) Comment "generated based on trans_to columns" How can constraints be used? - answerDelta Lake enforces schema on write, Databricks can support standard SQL constraint management clauses to ensure the quality and integrity of data added to the table. Shown in table properties of describe extended SQL: Constraints - answerALTER TABLE purchase_dates ADD CONSTRAINT valid_date CHECK (date > '2020-01-01') What's the difference between DEEP CLONE and SHALLOW CLONE? - answerDEEP CLONE fully copies the data and metadata from source to target. It occurs incrementally, so executing this command again can sync changes. SHALLOW CLONE copies the Delta transaction logs, meaning the data doesn't move SQL: Clone - answerCREATE TABLE name DEEP CLONE source CREATE TABLE name SHALLOW CLONE source What are the writing to Delta functions? - answerOverwrite - OVERWRITE, INSERT OVERWRITE

much faster, doesn't need to list recursively and delete
atomic, can still read table while deleting
ACID guarantees, can fallback
can use Delta time travel
INSERT OVERWRITE will fail if the schema changes

Append - INSERT INTO

incremental updates to existing tables, more efficient than full overwrite
no guarantees on inserting duplicate records Append, update and delete - MERGE INTO Merge - updates, inserts and delete are a single transaction, supports multiple conditionals
need to define when matched and when not matched
MERGE INTO only needs when not matched to avoid inserting duplicate records Ingest incrementally - COPY INTO
incrementally ingest data from other systems
potentially much cheaper SQL: MERGE INTO - answerMERGE INTO events a USING events1 b ON a.id = b.id and a.ts = b.ts WHEN NOT MATCHED AND b.traffic_source = 'email' THEN INSERT * SQL: Create json table - answerCREATE TABLE name (Key BINARY) USING JSON OPTIONS (path = "path") CREATE TABLE name AS SELECT * FROM json.path SQL: INSERT INTO - answerINSERT INTO events_raw Select * from events_json Does count skip NULL? - answerCount(col) skips NULL Count() counts NULL Count_if(user IS NULL) // counts nulls SQL: Distinct Rows and values - answerSELECT count(DISTINCT()) FROM table SELECT count(DISTINCT(id)) FROM table SQL: Dedup rows - answerCREATE VIEW name AS SELECT DISTINCT(*) FROM table; //does not omit row

PySpark: Writestream - answerSpark.table().writestream.option(checkpoint location).outputmode().trigger.table() What are the SQL functions for timestamp and file name? - answerCurrent_timestamp() Input_file_name() When should you use Auto Loader? - answerFor incremental ingestion SQL: What is the syntax for a streaming table? - answerCREATE OR REFRESH STREAMING LIVE TABLE name What's the difference between a data lake house, data warehouse and data lake? - answerStore data in an open data lake and get quality performance and security. ACID transactions, schema enforcement and evolution Enables BI, AI How is work processed across nodes in a cluster? - answerDriver node (teacher) assigns the work to executors (workers) Workers perform tasks. What is the limitation with Databricks Notebooks version vs Repos? - answerNotebooks version live with the notebook and are lost when the notebook is deleted or exported. What's the difference between a View and CTE? - answerView can be temporary or persist, can be accessed across multiple SQL statements. CTE can be accessed within a SQL statement What are the general concepts, triggers and watermarks for Structured Streaming? - answerWatermarks - dropping of old state data Triggers - time windows Express computations on streaming data What is the Auto Loader used for? - answerStreaming reads What are the benefits to Delta Live Tables? - answerSimple declarative approach for building data pipelines Manage structure and equality Simplify batch and streaming What are the benefits to Unity Catalog? - answerUnified governance across clouds Data and AI assets What are the Table ACL permissions? - answerCatalog, database, table, view, function, any file

What are the Table ACLs roles? - answerDatabricks admin, catalog owner, database owner, table owner What are the Table ACL privileges? - answerALL PRIVILEGES, SELECT, MODIFY, READ_METADATA, USAGE, CREATE How do you set permissions with Databricks SQL? - answerGRANT USAGE DATABASE name USER name What are examples of narrow transformations? - answerFilter, contains, map because they only operate on a single partition Examples of trigger parameter - answerSpark.table().writestream.optin(checkpoint).outputmode(complete).trigger(availa bleNow=True).table().awaittermination() What are the differences with the trigger parameter? - answerProcessingTime - user intervals Once AvailableNow - process all data available then stop What is auto loader preferred for? - answerReading data off of cloud storage What is the target field in DLT? - answerDatabase where the result tables will be published SQL: How to start a DLT? - answerCREATE INCREMENTAL LIVE TABLE Cloud_files() SQL: What do you add to tables when using a DLT? - answerLIVE LIVE.table FROM STREAM() Which of the following describes a benefit of a data lakehouse that is unavailable in a traditional data warehouse? - answerA data lakehouse enables both batch and streaming analytics. A data engineer has created a Delta table as part of a data pipeline. Downstream data analysts now need SELECT permission on the Delta table. Assuming the data engineer is the Delta table owner, which part of the Databricks Lakehouse Platform can the data engineer use to grant the data analysts the appropriate access? - answerdata explorer

Data Engineering Associate with Databricks Questions with Correct Answers, Exams of Advanced Education

Related documents

Partial preview of the text

Download Data Engineering Associate with Databricks Questions with Correct Answers and more Exams Advanced Education in PDF only on Docsity!

Data Engineering Associate with

Databricks

WHEN NOT MATCHED AND

THEN DELETE

USING JDBC

OPTIONS (

URL