Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

DATABRICKS - DATA ENGINEER ASSOCIATE EXAM 1 2024/2025, Exams of Nursing

University of Phoenix (UOPX)Nursing

DATABRICKS - DATA ENGINEER ASSOCIATE EXAM 1 2024/2025

Typology: Exams

2024/2025

Available from 09/03/2024

answerhub 🇺🇸

4

(16)

5.4K documents

1 / 42

This page cannot be seen from the preview

Don't miss anything!

DATABRICKS - DATA ENGINEER

ASSOCIATE EXAM 1 2024/2025

You were asked to create a table that can store the below data, <orderTime> is a timestamp but the

finance team when they query this data normally prefer the <orderTime> in date format, you would like

to create a calculated column that can convert the <orderTime> column timestamp datatype to date

and store it, fill in the blank to complete the DDL.

CREATE TABLE orders (

orderId int,

orderTime timestamp,

orderdate date _____________________________________________ ,

units int)

A. AS DEFAULT (CAST(orderTime as DATE))

B. GENERATED ALWAYS AS (CAST(orderTime as DATE))

C. GENERATED DEFAULT AS (CAST(orderTime as DATE))

D. AS (CAST(orderTime as DATE))

E. Delta lake does not support calculated columns, value should be inserted into the table as part of the

ingestion process - Precise Answer ✔✔B. GENERATED ALWAYS AS (CAST(orderTime as DATE))

Explanation

The answer is, GENERATED ALWAYS AS (CAST(orderTime as DATE))

https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#--use-generated-columns

Delta Lake supports generated columns which are a special type of columns whose values are

automatically generated based on a user-specified function over other columns in the Delta table. When

you write to a table with generated columns and you do not explicitly provide values for them, Delta

Lake automatically computes the values.

Discover Exams of Nursing University of Phoenix (UOPX)

Partial preview of the text

Download DATABRICKS - DATA ENGINEER ASSOCIATE EXAM 1 2024/2025 and more Exams Nursing in PDF only on Docsity!

DATABRICKS - DATA ENGINEER

ASSOCIATE EXAM 1 2024/

You were asked to create a table that can store the below data, is a timestamp but the finance team when they query this data normally prefer the in date format, you would like to create a calculated column that can convert the column timestamp datatype to date and store it, fill in the blank to complete the DDL. CREATE TABLE orders ( orderId int, orderTime timestamp, orderdate date _____________________________________________ , units int) A. AS DEFAULT (CAST(orderTime as DATE)) B. GENERATED ALWAYS AS (CAST(orderTime as DATE)) C. GENERATED DEFAULT AS (CAST(orderTime as DATE)) D. AS (CAST(orderTime as DATE)) E. Delta lake does not support calculated columns, value should be inserted into the table as part of the ingestion process - Precise Answer ✔✔B. GENERATED ALWAYS AS (CAST(orderTime as DATE)) Explanation The answer is, GENERATED ALWAYS AS (CAST(orderTime as DATE)) https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#--use-generated-columns Delta Lake supports generated columns which are a special type of columns whose values are automatically generated based on a user-specified function over other columns in the Delta table. When you write to a table with generated columns and you do not explicitly provide values for them, Delta Lake automatically computes the values.

Note: Databricks also supports partitioning using generated column The data engineering team noticed that one of the job fails randomly as a result of using spot instances, what feature in Jobs/Tasks can be used to address this issue so the job is more stable when using spot instances? A. Use Databrick REST API to monitor and restart the job B. Use Jobs runs, active runs UI section to monitor and restart the job C. Add second task and add a check condition to rerun the first task if it fails D. Restart the job cluster, job automatically restarts E. Add a retry policy to the task - Precise Answer ✔✔E. Add a retry policy to the task The answer is, Add a retry policy to the task Tasks in Jobs support Retry Policy, which can be used to retry a failed tasks, especially when using spot instance it is common to have failed executors or driver. What is the main difference between AUTO LOADER and COPY INTO? A. COPY INTO supports schema evolution. B. AUTO LOADER supports schema evolution. C. COPY INTO supports file notification when performing incremental loads. D. AUTO LOADER supports reading data from Apache Kafka E, AUTO LOADER Supports file notification when performing incremental loads. - Precise Answer ✔✔E, AUTO LOADER Supports file notification when performing incremental loads. Explanation Auto loader supports both directory listing and file notification but COPY INTO only supports directory listing.

A. Schema location is used to store user provided schema B. Schema location is used to identify the schema of target table C. AUTO LOADER does not require schema location, because its supports Schema evolution D. Schema location is used to store schema inferred by AUTO LOADER E. Schema location is used to identify the schema of target table and source table - Precise Answer ✔✔D. Schema location is used to store schema inferred by AUTO LOADER Explanation The answer is, Schema location is used to store schema inferred by AUTO LOADER, so the next time AUTO LOADER runs faster as does not need to infer the schema every single time by trying to use the last known schema. Auto Loader samples the first 50 GB or 1000 files that it discovers, whichever limit is crossed first. To avoid incurring this inference cost at every stream start up, and to be able to provide a stable schema across stream restarts, you must set the option cloudFiles.schemaLocation. Auto Loader creates a hidden directory _schemas at this location to track schema changes to the input data over time. The below link contains detailed documentation on different options Auto Loader options | Databricks on AWS Which of the following statements are incorrect about the lakehouse? A. Support end-to-end streaming and batch workloads B. Supports ACID

C. Support for diverse data types that can store both structured and unstructured D. Supports BI and Machine learning E. Storage is coupled with Compute - Precise Answer ✔✔E. Storage is coupled with Compute Explanation The answer is, Storage is coupled with Compute. The question was asking what is the incorrect option, in Lakehouse Storage is decoupled with compute so both can scale independently. What Is a Lakehouse? - The Databricks Blog You are designing a data model that works for both machine learning using images and Batch ETL/ELT workloads. Which of the following features of data lakehouse can help you meet the needs of both workloads? A. Data lakehouse requires very little data modeling. B. Data lakehouse combines compute and storage for simple governance. C. Data lakehouse provides autoscaling for compute clusters. D. Data lakehouse can store unstructured data and support ACID transactions. E. Data lakehouse fully exists in the cloud. - Precise Answer ✔✔D. Data lakehouse can store unstructured data and support ACID transactions. Explanation The answer is A data lakehouse stores unstructured data and is ACID-compliant, Which of the following locations in Databricks product architecture hosts jobs/pipelines and queries?

b. The job cluster is best suited for this purpose. c. Use Azure VM to read and write delta tables in Python (Incorrect) d. Use delta live table pipeline to run in continuous mode - Precise Answer ✔✔b. The job cluster is best suited for this purpose. Explanation The answer is, The Job cluster is best suited for this purpose. Since you don't need to interact with the notebook during the execution especially when it's a scheduled job, job cluster makes sense. Using an all-purpose cluster can be twice as expensive as a job cluster. FYI, When you run a job scheduler with option of creating a new cluster when the job is complete it terminates the cluster. You cannot restart a job cluster. Which of the following developer operations in CI/CD flow can be implemented in Databricks Repos? a. Merge when code is committed b. Pull request and review process c. Trigger Databricks Repos API to pull the latest version of code into production folder d. Resolve merge conflicts e. Delete a branch - Precise Answer ✔✔c. Trigger Databricks Repos API to pull the latest version of code into production folder

Explanation See the below diagram to understand the role Databricks Repos and Git provider plays when building a CI/CD workflow. All the steps highlighted in yellow can be done Databricks Repo, all the steps highlighted in Gray are done in a git provider like Github or Azure DevOps You are currently working with the second team and both teams are looking to modify the same notebook, you noticed that the second member is copying the notebooks to the personal folder to edit and replace the collaboration notebook, which notebook feature do you recommend to make the process easier to collaborate. a. Databricks notebooks should be copied to a local machine and setup source control locally to version the notebooks b. Databricks notebooks support automatic change tracking and versioning c. Databricks Notebooks support real-time coauthoring on a single notebook d. Databricks notebooks can be exported into dbc archive files and stored in data lake e. Databricks notebook can be exported as HTML and imported at a later time - Precise Answer ✔✔c. Databricks Notebooks support real-time coauthoring on a single notebook Explanation Answer is Databricks Notebooks support real-time coauthoring on a single notebook Every change is saved, and a notebook can be changed my multiple users. You are currently working on a project that requires the use of SQL and Python in a given notebook, what would be your approach

Explanation Delta lake is · Open source · Builds up on standard data format · Optimized for cloud object storage · Built for scalable metadata handling Delta lake is not · Proprietary technology · Storage format · Storage medium · Database service or data warehouse You were asked to create or overwrite an existing delta table to store the below transaction data. | transactionId | transactionDate | unitsSold | 1 | 01-01-2021 09:10:24 AM | 100 | 2 | 01-01-2021 10:20:24 PM | 10 a. CREATE OR REPLACE DELTA TABLE transactions ( transactionId int, transactionDate timestamp, unitsSold int) b. CREATE OR REPLACE TABLE IF EXISTS transactions ( transactionId int, transactionDate timestamp,

unitsSold int) FORMAT DELTA c. CREATE IF EXISTS REPLACE TABLE transactions ( transactionId int, transactionDate timestamp, unitsSold int) d. CREATE OR REPLACE TABLE transactions ( transactionId int, transactionDate timestamp, unitsSold int) - Precise Answer ✔✔d. CREATE OR REPLACE TABLE transactions ( transactionId int, transactionDate timestamp, unitsSold int) Explanation The answer is CREATE OR REPLACE TABLE transactions ( transactionId int, transactionDate timestamp, unitsSold int) When creating a table in Databricks by default the table is stored in DELTA format.

You noticed a colleague is manually copying the data to the backup folder prior to running an update command, incase if the update command did not provide the expected outcome so he can use the backup copy to replace table, which Delta Lake feature would you recommend simplifying the process? a. Use time travel feature to refer old data instead of manually copying b. Use DEEP CLONE to clone the table prior to update to make a backup copy c. Use SHADOW copy of the table as preferred backup choice d. Cloud object storage retains previous version of the file e. Cloud object storage automatically backups the data - Precise Answer ✔✔a. Use time travel feature to refer old data instead of manually copying Explanation The answer is, Use time travel feature to refer old data instead of manually copying. https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html SELECT count() FROM my_table TIMESTAMP AS OF "2019-01-01" SELECT count() FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1) SELECT count(*) FROM my_table TIMESTAMP AS OF "2019-01-01 01:30:00.000" Which one of the following is not a Databricks lake house object? a. Tables b. Views c. Database/Schemas d. Catalog

e. Functions f. Stored Procedures - Precise Answer ✔✔f. Stored Procedures Explanation The answer is, Stored Procedures. Databricks lakehouse does not support stored procedures. What type of table is created when you create delta table with below command? CREATE TABLE transactions USING DELTA LOCATION "DBFS:/mnt/bronze/transactions" a. Managed delta table b. External table c. Managed table d. Temp table e. Delta Lake table - Precise Answer ✔✔b. External table Explanation Anytime a table is created using the LOCATION keyword it is considered an external table, below is the current syntax. Syntax CREATE TABLE table_name ( column column_data_type...) USING format LOCATION "dbfs:/" format -> DELTA, JSON, CSV, PARQUET, TEXT I created the table command based on the above question, you can see it created an external table,

e. Temporary views are created in local_temp database - Precise Answer ✔✔a. Temporary views are lost once the notebook is detached and re-attached Explanation The answer is Temporary views are lost once the notebook is detached and attached There are two types of temporary views that can be created, Session scoped and Global A local/session scoped temporary view is only available with a spark session, so another notebook in the same cluster can not access it. if a notebook is detached and reattached local temporary view is lost. A global temporary view is available to all the notebooks in the cluster, if a cluster restarts global temporary view is lost. Which of the following is correct for the global temporary view? a. global temporary views cannot be accessed once the notebook is detached and attached b. global temporary views can be accessed across many clusters c. global temporary views can be still accessed even if the notebook is detached and attached d. global temporary views can be still accessed even if the cluster is restarted e. global temporary views are created in a database called temp database - Precise Answer ✔✔c. global temporary views can be still accessed even if the notebook is detached and attached Explanation

The answer is global temporary views can be still accessed even if the notebook is detached and attached There are two types of temporary views that can be created Local and Global · A local temporary view is only available with a spark session, so another notebook in the same cluster can not access it. if a notebook is detached and reattached local temporary view is lost. · A global temporary view is available to all the notebooks in the cluster, even if the notebook is detached and reattached it can still be accessible but if a cluster is restarted the global temporary view is lost. You are currently working on reloading customer_sales tables using the below query INSERT OVERWRITE customer_sales SELECT * FROM customers c INNER JOIN sales_monthly s on s.customer_id = c.customer_id After you ran the above command, the Marketing team quickly wanted to review the old data that was in the table. How does INSERT OVERWRITE impact the data in the <customer_sales> table if you want to see the previous version of the data prior to running the above statement? a. Overwrites the data in the table, all historical versions of the data, you can not time travel to previous versions b. Overwrites the data in the table but preserves all historical versions of the data, you can time travel to previous versions c. Overwrites the current version of the data but clears all historical versions of the data, so you can not time travel to previous versions.

Any DML/DDL operation(except DROP TABLE) on the Delta table preserves the historical version of the data. Which of the following SQL statement can be used to query a table by eliminating duplicate rows from the query results? a. SELECT DISTINCT * FROM table_name b. SELECT DISTINCT * FROM table_name HAVING COUNT() > 1 c. SELECT DISTINCT_ROWS () FROM table_name d. SELECT * FROM table_name GROUP BY * HAVING COUNT(*) < 1 e. SELECT * FROM table_name GROUP BY *

HAVING COUNT(*) > 1 - Precise Answer ✔✔a. SELECT DISTINCT * FROM table_name Which of the below SQL Statements can be used to create a SQL UDF to convert Celsius to Fahrenheit and vice versa, you need to pass two parameters to this function one, actual temperature, and the second that identifies if its needs to be converted to Fahrenheit or Celsius with a one-word letter F or C? select udf_convert(60,'C') will result in 15. select udf_convert(10,'F') will result in 50 a. CREATE UDF FUNCTION udf_convert(temp DOUBLE, measure STRING) RETURNS DOUBLE RETURN CASE WHEN measure == 'F' then (temp * 9/5) + 32 ELSE (temp - 33 ) * 5/ END b. CREATE UDF FUNCTION udf_convert(temp DOUBLE, measure STRING) RETURN CASE WHEN measure == 'F' then (temp * 9/5) + 32 ELSE (temp - 33 ) * 5/ END c. CREATE FUNCTION udf_convert(temp DOUBLE, measure STRING) RETURN CASE WHEN measure == 'F' then (temp * 9/5) + 32 ELSE (temp - 33 ) * 5/ END

DATABRICKS - DATA ENGINEER ASSOCIATE EXAM 1 2024/2025, Exams of Nursing

Related documents

Partial preview of the text

Download DATABRICKS - DATA ENGINEER ASSOCIATE EXAM 1 2024/2025 and more Exams Nursing in PDF only on Docsity!

DATABRICKS - DATA ENGINEER

ASSOCIATE EXAM 1 2024/