Databricks Data Analyst Associate Exam Prep: Practice Questions, Exams of Cybercrime, Cybersecurity and Data Privacy

Practice questions and answers for the databricks data analyst associate exam. It covers topics such as medallion architecture, databricks sql, connecting to fivetran, delta lake, and data governance. The questions are designed to test knowledge of databricks services and capabilities, including sql queries, data visualizations, and data ingestion techniques. It also addresses considerations for working with personally identifiable information (pii) data and the benefits of using a delta lake-based data lakehouse. This resource is valuable for data analysts preparing for the databricks certification exam.

Typology: Exams

2024/2025

Available from 11/10/2025

zabibu-nassoro
zabibu-nassoro šŸ‡ŗšŸ‡ø

127 documents

1 / 48

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Databricks Data
Analyst -
Associate Exam Prep
Save
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30

Partial preview of the text

Download Databricks Data Analyst Associate Exam Prep: Practice Questions and more Exams Cybercrime, Cybersecurity and Data Privacy in PDF only on Docsity!

Databricks Data

Analyst -

Associate Exam Prep

Save

Which of the following layers of the medallion architecture is most commonly used by data analysts? A. None of these layers are used by data analysts B. Gold C. All of these layers are used equally by data analysts D. Silver E. Bronze - B. Gold A data analyst has recently joined a new team that uses Databricks SQL, but the analyst has never used Databricks before. The analyst wants to know where in Databricks SQL they can write and execute SQL queries. On which of the following pages can the analyst write and execute SQL queries? A. Data page B. Dashboards page C. Queries page D. Alerts page E. SQL Editor page - E. SQL Editor page

A data analyst has set up a SQL query to run every four hours on a SQL endpoint, but the SQL endpoint is taking too long to start up with each run. Which of the following changes can the data analyst make to reduce the start-up time for the endpoint while managing costs? A. Reduce the SQL endpoint cluster size B. Increase the SQL endpoint cluster size C. Turn off the Auto stop feature D. Increase the minimum scaling value E. Use a Serverless SQL endpoint - E. Use a Serverless SQL endpoint A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The micro batches are triggered every minute. A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables. Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task? A. The required compute resources could be costly B. The gold-level tables are not appropriately clean for business reporting C. The streaming data is not an appropriate data source for a dashboard D. The streaming cluster is not fault tolerant E. The dashboard cannot be refreshed that quickly - A. The required compute resources could be costly Which of the following approaches can be used to ingest data directly from cloud-based object storage? A. Create an external table while specifying the DBFS storage path to FROM B. Create an external table while specifying the DBFS storage path to PATH C. It is not possible to directly ingest data from cloud-based object storage D. Create an external table while specifying the object storage path to FROM E. Create an external table while specifying the object storage path to LOCATION - E. Create an external table while specifying the object storage path to LOCATION

A data analyst wants to create a dashboard with three main sections: Development, Testing, and Production. They want all three sections on the same dashboard, but they want to clearly designate the sections using text on the dashboard.Which of the following tools can the data analyst use to designate the Development, Testing, and Production sections using text? A. Separate endpoints for each section B. Separate queries for each section C. Markdown-based text boxes D. Direct text written into the dashboard in editing mode E. Separate color palettes for each section - C. Markdown-based text boxes A data analyst needs to use the Databricks Lakehouse Platform to quickly create SQL queries and data visualizations. It is a requirement that the compute resources in the platform can be made serverless, and it is expected that data visualizations can be placed within a dashboard.Which of the following Databricks Lakehouse Platform services/capabilities meets all of these requirements? A. Delta Lake B. Databricks Notebooks C. Tableau D. Databricks Machine Learning E. Databricks SQL - E. Databricks SQL A data analyst is attempting to drop a table my_table. The analyst wants to delete all table metadata and data. They run the following command: DROP TABLE IF EXISTS my_table; While the object no longer appears when they run SHOW TABLES, the data files still exist. Which of the following describes why the data files still exist and the metadata files were deleted? A. The table's data was larger than 10 GB B. The table did not have a location C. The table was external D. The table's data was smaller than 10 GB

C. It can be used to produce dashboards that allow data exploration. D. It can be used to make visualizations that can be shared with stakeholders. E. It can be used to connect to third party BI cools. - B. It can be used to view metadata and data, as well as view/change permissions. A data analyst created and is the owner of the managed table my_ table. They now want to change ownership of the table to a single other user using Data Explorer.Which of the following approaches can the analyst use to complete the task? A. Edit the Owner field in the table page by removing their own account B. Edit the Owner field in the table page by selecting All Users C. Edit the Owner field in the table page by selecting the new owner's account D. Edit the Owner field in the table page by selecting the Admins group E. Edit the Owner field in the table page by removing all access - C. Edit the Owner field in the table page by selecting the new owner's account A data analyst has a managed table table_name in database database_name. They would now like to remove the table from the database and all of the data files associated with the table. The rest of the tables in the database must continue to exist.Which of the following commands can the analyst use to complete the task without producing an error? A. DROP DATABASE database_name; B. DROP TABLE database_name.table_name; C. DELETE TABLE database_name.table_name; D. DELETE TABLE table_name FROM database_name; E. DROP TABLE table_name FROM database_name; - B. DROP TABLE database_name.table_name; A data analyst runs the following command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers;What is the result of running this command? A. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, and any duplicate data is deleted. B. The command fails because it is written incorrectly.

C. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data. D. The suppliers table now contains the data from the new_suppliers table, and the new_suppliers table now contains the data from the suppliers table. E. The suppliers table now contains only the data from the new_suppliers table. - B. The command fails because it is written incorrectly. A data engineer is working with a nested array column products in table transactions. They want to expand the table so each unique item in products for each row has its own row where the transaction_id column is duplicated as necessary.They are using the following incomplete command: SELECT transaction_id, ___________ AS product FROM transactions; Which of the following lines of code can they use to fill in the blank in the above code block so that it successfully completes the task? A. array distinct(products) B. explode(products) C. reduce(products) D. array(products) E. flatten(products) - B. explode(products) A data analysis team is working with the table_bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table_bronze as the source of the duplication.Which of the following queries can be used to deduplicate the data from table_bronze and write it to a new table table_silver? A. CREATE TABLE table_silver ASSELECT DISTINCT *FROM table_bronze; B. CREATE TABLE table_silver ASINSERT *FROM table_bronze; C. CREATE TABLE table_silver ASMERGE DEDUPLICATE *FROM table_bronze; D. INSERT INTO TABLE table_silverSELECT * FROM table_bronze; E. INSERT OVERWRITE TABLE table_silverSELECT * FROM table_bronze; - A. CREATE TABLE table_silver ASSELECT DISTINCT *FROM table_bronze;

E. SELECT price(customer_spend, customer_units) AS customer_price FROM customer_summary - E. SELECT price(customer_spend, customer_units) AS customer_price FROM customer_summary A data analyst has been asked to count the number of customers in each region and has written the following query: SELECT region, count() AS number_of_customers FROM customers ORDER BY region; If there is a mistake in the query, which of the following describes the mistake? A. The query is using count(), which will count all the customers in the customers table, no matter the region. B. The query is missing a GROUP BY region clause. C. The query is using ORDER BY, which is not allowed in an aggregation. D. There are no mistakes in the query. E. The query is selecting region, but region should only occur in the ORDER BY clause. - B. The query is missing a GROUP BY region clause. How can a data analyst determine if query results were pulled from the cache? A. Go to the Query History tab and click on the text of the query. The slideout shows if the results came from the cache. B. Go to the Alerts tab and check the Cache Status alert. C. Go to the Queries tab and click on Cache Status. The status will be green if the results from the last run came from the cache. D. Go to the SQL Warehouse (formerly SQL Endpoints) tab and click on Cache. The Cache file will show the contents of the cache. E. Go to the Data tab and click Last Query. The details of the query will show if the results came from the cache. - A. Go to the Query History tab and click on the text of the query. The slideout shows if the results came from the cache. Which of the following statements about a refresh schedule is incorrect?

A. A query can be refreshed anywhere from 1 minute to 2 weeks. B. Refresh schedules can be configured in the Query Editor. C. A query being refreshed on a schedule does not use a SQL Warehouse (formerly known as SQL Endpoint). D. A refresh schedule is not the same as an alert. E. You must have workspace administrator privileges to configure a refresh schedule. - C. A query being refreshed on a schedule does not use a SQL Warehouse (formerly known as SQL Endpoint). A data analyst creates a Databricks SQL Query where the result set has the following schema: region STRING number_of_customer INT When the analyst clicks on the "Add visualization" button on the SQL Editor page, which of the following types of visualizations will be selected by default? A. Violin Chart B. Line Chart C. Bar Chart D. Histogram E. There is no default. The user must choose a visualization type. - C. Bar Chart A data analyst has created a Query in Databricks SQL, and now they want to create two data visualizations from that Query and add both of those data visualizations to the same Databricks SQL Dashboard. Which of the following steps will they need to take when creating and adding both data visualizations to the Databricks SQL Dashboard? A. They will need to alter the Query to return two separate sets of results. B. They will need to add two separate visualizations to the dashboard based on the same Query. C. They will need to create two separate dashboards. D. They will need to decide on a single data visualization to add to the dashboard. E. They will need to copy the Query and create one data visualization per query. - B. They will need to add two separate visualizations to the dashboard based on the same Query.

D. The area chart will convert to a Dashboard Parameter. - B. The area chart will use whatever is selected in the Dashboard Parameter along with all of the other visualizations in the dashboard that use the same parameter. A data analyst has been asked to configure an alert for a query that returns the income in the accounts_receivable table for a date range. The date range is configurable using a Date query parameter.The Alert does not work. Which of the following describes why the Alert does not work? A. Alerts don't work with queries that access tables. B. Queries that return results based on dates cannot be used with Alerts. C. The wrong query parameter is being used. Alerts only work with Date and Time query parameters. D. Queries that use query parameters cannot be used with Alerts. E. The wrong query parameter is being used. Alerts only work with dropdown list query parameters, not dates. - D. Queries that use query parameters cannot be used with Alerts. Which of the following statements about adding visual appeal to visualizations in the Visualization Editor is incorrect? A. Visualization scale can be changed. B. Data Labels can be formatted. C. Colors can be changed. D. Borders can be added. E. Tooltips can be formatted. - D. Borders can be added. A data team has been given a series of projects by a consultant that need to be implemented in the Databricks Lakehouse Platform. Which of the following projects should be completed in Databricks SQL? A. Testing the quality of data as it is imported from a source B. Tracking usage of feature variables for machine learning projects C. Combining two data sources into a single, comprehensive dataset D. Segmenting customers into like groups using a clustering algorithm

E. Automating complex notebook-based workflows with multiple tasks - C. Combining two data sources into a single, comprehensive dataset A data organization has a team of engineers developing data pipelines following the medallion architecture using Delta Live Tables. While the data analysis team working on a project is using gold-layer tables from these pipelines, they need to perform some additional processing of these tables prior to performing their analysis. Which of the following terms is used to describe this type of work? A. Data blending B. Last-mile dashboarding C. Data testing D. Last-mile ETL E. Data enhancement - D. Last-mile ETL Which of the following statements describes descriptive statistics? A. A branch of statistics that uses summary statistics to quantitatively describe and summarize data. B. A branch of statistics that uses a variety of data analysis techniques to infer properties of an underlying distribution of probability. C. A branch of statistics that uses quantitative variables that must take on a finite or countably infinite set of values. D. A branch of statistics that uses summary statistics to categorically describe and summarize data. E. A branch of statistics that uses quantitative variables that must take on an uncountable set of values. - A. A branch of statistics that uses summary statistics to quantitatively describe and summarize data. In which of the following situations will the mean value and median value of variable be meaningfully different? A. When the variable contains no outliers B. When the variable contains no missing values C. When the variable is of the boolean type

A) Delta sharing allows data present in only Delta format to be sharable inside the organization only B) Delta Sharing is a protocol to share data present inside databricks with other organizations C) Delta Sharing is the industry's first open protocol for secure data sharing, making it simple to share data with other organizations regardless of where the data lives D) Delta Sharing is a protocol to share data present inside databricks within the organizations - C) Delta Sharing is the industry's first open protocol for secure data sharing, making it simple to share data with other organizations regardless of where the data lives Which of these is the incorrect approach to handling Complex Data Types? A) Explore and Collect B) Support for Complex Data types is yet to be introduced C) User Defined Function D) Lambda Function - B) Support for Complex Data types is yet to be introduced On the dashboard, If you have a parameter added to the dashboard, how will it affect the visualizations: A) The parameter will be used to fetch data for all the visualizations which use this parameter B) The parameter will be used to fetch data for some visualizations but not for others though they use the same parameter C) The parameter is not added if a parameterized query/visualization is added D) The parameter will not make any difference to the visualizations - A) The parameter will be used to fetch data for all the visualizations which use this parameter What are the use of the Delta Lake Transaction log? A) To provide ACID transaction capabilities B) To track changes C) All of the above D) None of the above - C) All of the above

Which of the following is NOT a benefit of data governance? A) Consistent and high data quality for analytics and machine learning B) Data democratization for Decision-making C) Increased data complexity D) Reduced time to insight - C) Increased data complexity Which of the following is incorrect about Auto loader in Databricks SQL: A) Loads json, parquet, csv files B) Ensures file contents are loaded only once C) Removes duplicates in the file D) Automatically uploads files - C) Removes duplicates in the file What are various warehouse types available in Databricks SQL when creating SQL Warehouse? A) Serverless warehouse only B) Serverless, Classic, and Pro C) Enterprise, OnPrem and Classic D) Classic and Pro Warehouse - B) Serverless, Classic, and Pro For Data Ingestion in Databricks SQL using existing files, which of the following format is not supported by Databricks? A) Delta B) JSON C) Parquet D) Microsoft Word E) CSV - D) Microsoft Word

Which of the following is a benefit of Delta Lake? A) Time Travel capabilities B) All of the above C) None of the above D) ACID Transactions E) Schema Enforcement - B) All of the above Which of the following option is not available on the Databricks SQL landing page? A) Query history B) Partner Connect C) Recent Queries and Dashboards D) Blog posts E) Documentation - A) Query history A new data analyst has joined your team. He has recently been added to the company's Databricks workspace as [email protected]. The data analyst should be able to query the table sales in the database retail. The new data analyst has been granted USAGE on the database retail already. Which of the following commands can be used to grant the appropriate permission to the new data analyst? A) GRANT USAGE ON TABLE sales TO [email protected]; B) GRANT USAGE ON TABLE [email protected] TO sales; C) GRANT SELECT ON TABLE [email protected] TO sales; D) GRANT SELECT ON TABLE sales TO [email protected]; E) GRANT CREATE ON TABLE sales TO [email protected]; - D) GRANT SELECT ON TABLE sales TO [email protected]; What is Data Cleaning?

A) This is a process that involves adding additional information to the data B) This is a process that involves filling in missing data C) This is the process that involves identifying or correcting the errors in the data D) This is a process that involves moving the data from the bronze layer to silver layer - C) This is the process that involves identifying or correcting the errors in the data How to change the owner of the schema to a specific user? A) Go to SQL Warehouses > Click on the SQL Warehouse > Change owner B) Once set, the owner cannot be changed C) Go to Workspace > Change Owner D) Go to Data Explorer > Click on the schema > Click on owner option under the schema name and change it to the other username - D) Go to Data Explorer > Click on the schema > Click on owner option under the schema name and change it to the other username How are materialized views refreshed? A) Automatically every hour B) Manually by the user C) Only when there are changes in upstream datasets D) According to the updated schedule of the pipeline - D) According to the updated schedule of the pipeline Which of the following operation(s) is/are possible from Data Explorer? A) All of above B) View schema details C) View Data warehouse, Tables, Locations Detail, Query History, Sample data D) None of above E) Grant and revoke permissions - A) All of above