Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Databricks Certified Associate Developer for Apache Spark 3.0 – Python exam, Exams of Nursing

Alabama State University (ASU)Nursing

Various topics related to the databricks certified associate developer for apache spark 3.0 - python exam. It provides explanations and examples for concepts such as spark driver, worker nodes, slots, tasks, stages, shuffles, transformations, actions, execution/deployment modes, out-of-memory errors, storage levels, broadcast variables, data partitioning, dataframes, and sql udfs. Likely intended to serve as a study guide or reference material for individuals preparing for the databricks certified associate developer for apache spark 3.0 - python exam. The level of detail and the technical nature of the content suggest that this document would be most useful for university students or lifelong learners with a strong background in data engineering, big data processing, and apache spark.

Typology: Exams

2023/2024

Available from 08/13/2024

Ellah1 🇺🇸

4.3

(11)

11K documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

Databricks Certified Associate Developer for

Apache Spark 3.0 – Python exam

What is a Spark Driver? - The spark driver is the node in which the Spark

application's main method runs to coordinate the Spark application. It

contains the SparkContext object. Responsible for scheduling the

execution of data by various worker nodes in cluster mode.

What are worker nodes in cluster-mode Spark - Worker nodes are

machines that host the executors responsible for the execution of tasks.

What are slots? - Slots are resources for parallelization within a Spark

application.

What is a combination of a block of data and a set of transformers that

runs on a single executor? - Task

What is a group of tasks that can be executed in parallel to compute the

same set of operations on potentially multiple machines? - Stage

Discover Exams of Nursing Alabama State University (ASU)

Partial preview of the text

Download Databricks Certified Associate Developer for Apache Spark 3.0 – Python exam and more Exams Nursing in PDF only on Docsity!

Apache Spark 3.0 – Python exam

What is a Spark Driver? - The spark driver is the node in which the Spark application's main method runs to coordinate the Spark application. It contains the SparkContext object. Responsible for scheduling the execution of data by various worker nodes in cluster mode. What are worker nodes in cluster-mode Spark - Worker nodes are machines that host the executors responsible for the execution of tasks. What are slots? - Slots are resources for parallelization within a Spark application. What is a combination of a block of data and a set of transformers that runs on a single executor? - Task What is a group of tasks that can be executed in parallel to compute the same set of operations on potentially multiple machines? - Stage

Apache Spark 3.0 – Python exam

What is a shuffle? - A shuffle is the process by which data is compared across partitions. If you have a DF with more partitions than you have (single core) executors what happens? - Performance will be suboptimal because not all data can be processed at the same time. Shuffle commands will create a large number of connections. Increased overhead associated with managing resources for data processing for each task. Increased risk of out-of-memory errors depending on the size of executors. which of the following operations will trigger evaluation? A) df.filter() B) df.distinct() C) df.intersect() D) df.join() E) df.count() - E) df.count()

Apache Spark 3.0 – Python exam

pass - pass What is an out-of-memory error in Spark? - An out-of-memory error occurs when either the driver or an executor does not have enough memory to collect or process the data allocated to it. Which of the following is the default storage level for persist() for a non- streaming dataframe/dataset? A) MEMORY_AND_DISK B) MEMORY_AND_DISK_SER C) DISK_ONLY D) MEMORY_ONLY_SER E) MEMORY_ONLY - A) MEMORY_AND_DISK What is a broadcast variable? - A broadcast variable is entirely cached on each worker node so it doesn't need to be shipped or shuffled between nodes within each stage.

Apache Spark 3.0 – Python exam

Which of the following operations is most likely to skew in size of your data's partitions? A) df.collect() B) df.cache() C) df.repartition(n) D) df.coalesce(n) E) df.persist() - D) df.coalesce(n) What data structures are Spark DataFrames built on top of? - RDDs (resilient distributed datasets) What is the code block needed to return a dataframe containing only column 'storeId' and column 'division' from a dataframe called 'storesDF'? - storesDF.select("storeId", "division") pass - pass

Apache Spark 3.0 – Python exam

What is the code that returns a new DF from a DF 'storesDF' where column 'numberOfManagers' is the constant integer 1? - storesDF.withColumn("numberOfManagers", lit(1)) pass - pass Which of the following operations can be used to split an array column into an individual DataFrame row for each element in the array? A) extract() B) split() C) explode() D) arrays_zip() E) unpack() - C) explode() What code returns a new DataFrame where column "storeCategory" is an all-lowercase version of column "storeCategory" in DataFrame "storesDF". - storesDF.withColumn("storeCategory", lower(col("storeCategory")))

Apache Spark 3.0 – Python exam

The code block shown below contains an error. The code block is intended to return a new DataFrame where column division from DataFrame storesDF has been renamed to column state and column managerName from DataFrame storesDF has been renamed to column managerFullName. Identify the error. Code block: (storesDF.withColumnRenamed("state", "division") .withColumnRenamed("managerFullName", "managerName")) - The first argument to operation withColumnRenamed() should be the old column name and the second argument should be the new column name. What is the code that returns a DataFrame where rows in DataFrame "storesDF" containing missing values in every column have been dropped. - storesDF.na.drop("all") Which of the following operations fails to return a DataFrame where every row is unique? A) DataFrame.distinct()

Apache Spark 3.0 – Python exam

Fill in the blanks on the block below to return a new DF with the mean of column 'sqft' from DF 'storesDF' in col 'sqftMean'. storesDF.1(2(3).alias("sqftMean") - 1 - agg 2 - mean 3 - col("sqft") Which of the following code blocks returns the number of rows in DF 'storesDF' A. storesDF.withColumn("numberOfRows", count()) B. storesDF.withColumn(count().alias("numberOfRows")) C. storesDF.countDistinct() D. storesDF.count() E. storesDF.agg(count()) - D. storesDF.count()

Apache Spark 3.0 – Python exam

What is the code block which returns the sum of values in colum 'sqft' in DF 'storesDF' grouped by distinct values in col 'division' - storesDF.groupBy("division".agg(sum(col("sqft))) What is the code block which returns a DF containing summary statistics only for column 'sqft' in DF 'storesDF'. - storesDF.describe("sqft") Which of the following operations can be used to sort the rows of a DataFrame? A) sort() and orderBy() B) orderby() C) sort() and orderby() D orderBy() E) sort() - A) sort() and orderBy()

Apache Spark 3.0 – Python exam

1.2._3 - 1) storesDF

first()
sqft How do you print the schema of a DataFrame? - DataFrame.printSchema() In what order should the below lines of code be run in order to create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Python function assessPerformance'' and apply it to column 'customerSatistfaction' in table 'stores'? Lines of code:

spark.udf.register("ASSESS_PERFORMANCE", assessPerformance)
spark.sql("SELECT customerSatisfaction, assessPerformance(customerSatisfaction) AS result FROM stores")
spark.udf.register(assessPerformance, "ASSESS_PERFORMANCE")

Apache Spark 3.0 – Python exam

spark.sql("SELECT customerSatisfaction, ASSESS_PERFORMANCE(customerSatisfaction) AS result FROM stores")

1 -> 4

Databricks Certified Associate Developer for Apache Spark 3.0 – Python exam, Exams of Nursing

Related documents

Partial preview of the text

Download Databricks Certified Associate Developer for Apache Spark 3.0 – Python exam and more Exams Nursing in PDF only on Docsity!

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam

Apache Spark 3.0 – Python exam