Partial preview of the text
Download MASTER DATABRICKS: UNIFIED DATA ANALYTICS & AI PLATFORM GUIDE and more Exams Advanced Data Analysis in PDF only on Docsity!
1] Databricks-Certified-Data-Analyst-Associate Latest Version: 7.0 Practice Exam Questions And Correct Answers (Verified Answers) Which of the following layers of the medallion architecture is most commonly used by data analysts? A. None of these layers are used by data analysts B. Gold C. All of these layers are used equally by data analysts D. Silver E. Bronze - .ANSWER.../V B. Gold A data analyst has recently joined a new team that uses Databricks SQL, but the analyst has never used Databricks before. The analyst wants to know where in Databricks SQL they can write and execute SQL queries. On which of the following pages can the analyst write and execute SQL queries? A. Data page B. Dashboards page C. Queries page D. Alerts page E. SQL Editor page - .ANSWER.../J/ E. SQL Editor page 2| Which of the following describes how Databricks SQL should be used in relation to other business intelligence (BI) tools like Tableau, Power BI, and looker? A. As an exact substitute with the same level of functionality B. As a substitute with less functionality C. As a complete replacement with additional functionality D. As a complementary tool for professional-grade presentations E. As a complementary tool for quick in-platform BI work - .ANSWER.../V E. As a complementary tool for quick in-platform BI work Which of the following approaches can be used to connect Databricks to Fivetran for data ingestion? A. Use Workflows to establish a SQL warehouse (formerly known as a SQL endpoint) for Fivetran to interact with B. Use Delta Live Tables to establish a cluster for Fivetran to interact with C. Use Partner Connect's automated workflow to establish a cluster for Fivetran to interact with 4] changes can the data analyst make to reduce the start-up time for the endpoint while managing costs? A. Reduce the SQL endpoint cluster size B. Increase the SQL endpoint cluster size C. Turn off the Auto stop feature D. Increase the minimum scaling value E. Use a Serverless SQL endpoint - .ANSWER...V/ E. Use a Serverless SQL endpoint A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The micro batches are triggered every minute. A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables. Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task? A. The required compute resources could be costly B. The gold-level tables are not appropriately clean for business reporting C. The streaming data is not an appropriate data source for a dashboard 5| D. The streaming cluster is not fault tolerant E. The dashboard cannot be refreshed that quickly - -ANSWER.../V A. The required compute resources could be costly Which of the following approaches can be used to ingest data directly from cloud-based object storage? A. Create an external table while specifying the DBFS storage path to FROM B. Create an external table while specifying the DBFS storage path to PATH C. It is not possible to directly ingest data from cloud- based object storage D. Create an external table while specifying the object storage path to FROM E. Create an external table while specifying the object storage path to LOCATION - .ANSWER.../V E. Create an external table while specifying the object storage path to LOCATION A data analyst wants to create a dashboard with three main sections: Development, Testing, and Production. They want all three sections on the same dashboard, but they want to clearly designate the sections using text on 7\ A data analyst is attempting to drop a table my_table. The analyst wants to delete all table metadata and data. They run the following command: DROP TABLE IF EXISTS my_table; While the object no longer appears when they run SHOW TABLES, the data files still exist. Which of the following describes why the data files still exist and the metadata files were deleted? A. The table's data was larger than 10 GB B. The table did not have a location C. The table was external D. The table's data was smaller than 10 GB E. The table was managed - .ANSWER...// C. The table was external Which of the following should data analysts consider when working with personally identifiable information (PII) data? A. Organization-specific best practices for Pll data B. Legal requirements for the area in which the data was collected C. None of these considerations D. Legal requirements for the area in which the analysis is being performed E. All of these considerations - .ANSWER.../V E. All of these considerations Delta Lake stores table data as a series of data files, but it also stores a lot of other information.Which of the following is stored alongside data files when using Delta Lake? A. None of these B. Table metadata, data summary visualizations, and owner account information C. Table metadata D. Data summary visualizations E. Owner account information - .ANSWER...// C. Table metadata Which of the following is an advantage of using a Delta Lake-based data lakehouse over common data lake solutions? A. ACID transactions B. Flexible schemas C. Data deletion 10 | A. Edit the Owner field in the table page by removing their own account B. Edit the Owner field in the table page by selecting All Users C. Edit the Owner field in the table page by selecting the new owner's account D. Edit the Owner field in the table page by selecting the Admins group E. Edit the Owner field in the table page by removing all access - .ANSWER.../V C. Edit the Owner field in the table page by selecting the new owner's account A data analyst has a managed table table_name in database database_name. They would now like to remove the table from the database and all of the data files associated with the table. The rest of the tables in the database must continue to exist.Which of the following commands can the analyst use to complete the task without producing an error? A. DROP DATABASE database_name; B. DROP TABLE database_name.table_name; C. DELETE TABLE database_name.table_name; D. DELETE TABLE table_name FROM database_name; 11] E. DROP TABLE table_name FROM database_name; - .ANSWER...// B. DROP TABLE database_name.table_name; A data analyst runs the following command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers;What is the result of running this command? A. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, and any duplicate data is deleted. B. The command fails because it is written incorrectly. C. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data. D. The suppliers table now contains the data from the new_suppliers table, and the new_suppliers table now contains the data from the suppliers table. E. The suppliers table now contains only the data from the new_suppliers table. - .ANSWER.../V B. The command fails because it is written incorrectly. A data engineer is working with a nested array column products in table transactions. They want to expand the 13 | B. CREATE TABLE table_silver ASINSERT *FROM table_bronze; C. CREATE TABLE table_silver ASMERGE DEDUPLICATE *FROM table_bronze; D. INSERT INTO TABLE table_silverSELECT * FROM table_bronze; E. INSERT OVERWRITE TABLE table_silverSELECT * FROM table_bronze; - .ANSWER...// A. CREATE TABLE table_silver ASSELECT DISTINCT *FROM table_bronze; A business analyst has been asked to create a data entity/object called sales_by_employee. It should always stay up-to-date when new data are added to the sales table. The new entity should have the columns sales_person, which will be the name of the employee from the employees table, and sales, which will be all sales for that particular sales person. Both the sales table and the employees table have an employee_id column that is used to identify the sales person. Which of the following code blocks will accomplish this task? - -ANSWER.../V CREATE OR REPLACE VIEW sales_by_employee AS SELECT employees.employee_name sales_person, sales.sales FROM sales 14 | JOIN employees ON employees.employee_id = sales.employee_id; In which of the following situations should a data analyst use higher-order functions? A. When custom logic needs to be applied to simple, unnested data B. When custom logic needs to be converted to Python- native code C. When custom logic needs to be applied at scale to array data objects D. When built-in functions are taking too long to perform tasks E. When built-in functions need to run through the Catalyst Optimizer - .ANSWER.../V C. When custom logic needs to be applied at scale to array data objects A data analyst has created a user-defined function using the following line of code:CREATE FUNCTION price(spend DOUBLE, units DOUBLE)RETURNS DOUBLE -RETURN spend / units;Which of the following code blocks can be used to apply this function to the customer_spend and customer_units columns of the table customer_summary to create column customer_price? 16 | A. The query is using count(*), which will count all the customers in the customers table, no matter the region. B. The query is missing a GROUP BY region clause. C. The query is using ORDER BY, which is not allowed in an aggregation. D. There are no mistakes in the query. E. The query is selecting region, but region should only occur in the ORDER BY clause. - .ANSWER...// B. The query is missing a GROUP BY region clause. How can a data analyst determine if query results were pulled from the cache? A. Go to the Query History tab and click on the text of the query. The slideout shows if the results came from the cache. B. Go to the Alerts tab and check the Cache Status alert. C. Go to the Queries tab and click on Cache Status. The status will be green if the results from the last run came from the cache. D. Go to the SQL Warehouse (formerly SQL Endpoints) tab and click on Cache. The Cache file will show the contents of the cache. 171 E. Go to the Data tab and click Last Query. The details of the query will show if the results came from the cache. - .-ANSWER.../V A. Go to the Query History tab and click on the text of the query. The slideout shows if the results came from the cache. Which of the following statements about a refresh schedule is incorrect? A. A query can be refreshed anywhere from 1 minute to 2 weeks. B. Refresh schedules can be configured in the Query Editor. C. A query being refreshed on a schedule does not use a SQL Warehouse (formerly known as SQL Endpoint). D. A refresh schedule is not the same as an alert. E. You must have workspace administrator privileges to configure a refresh schedule. - .ANSWER...// C.A query being refreshed on a schedule does not use a SQL Warehouse (formerly known as SQL Endpoint). A data analyst creates a Databricks SQL Query where the result set has the following schema: region STRING number_of_customer INT When the analyst clicks on the "Add visualization” button on the SQL Editor page, which 19 | E. They will need to copy the Query and create one data visualization per query. - .ANSWER.../V B. They will need to add two separate visualizations to the dashboard based on the same Query. A data analyst has been asked to provide a list of options on how to share a dashboard with a client. It is a security requirement that the client does not gain access to any other information, resources, or artifacts in the database. Which of the following approaches cannot be used to share the dashboard and meet the security requirement? A. Download the Dashboard as a PDF and share it with the client. B. Set a refresh schedule for the dashboard and enter the client's email address in the "Subscribers" box. C. Take a screenshot of the dashboard and share it with the client. D. Generate a Personal Access Token that is good for 1 day and share it with the client. E. Download a PNG file of the visualizations in the dashboard and share them with the client. - .ANSWER...//_ D. Generate a Personal Access Token that is good for 1 day and share it with the client. 20 | A data analyst has been asked to produce a visualization that shows the flow of users through a website. Which of the following is used for visualizing this type of flow? A. Heatmap B. Choropleth C. Word Cloud D. Pivot Table E. Sankey - .ANSWER...// E. Sankey An analyst writes a query that contains a query parameter. They then add an area chart visualization to the query. While adding the area chart visualization to a dashboard, the analyst chooses "Dashboard Parameter" for the query parameter associated with the area chart. Which of the following statements is true? A. The area chart will use whatever is selected in the Dashboard Parameter while all or the other visualizations will remain changed regardless of their parameter use. B. The area chart will use whatever is selected in the Dashboard Parameter along with all of the other visualizations in the dashboard that use the same parameter.