DAG Authoring Practice Exam, Exams of Technology

Focuses exclusively on DAG design, authoring best practices, efficient task structuring, modularization techniques, XCom usage, retries, SLAs, and Airflow UI analysis. Candidates work through complex DAG-writing exercises, troubleshooting dependency cycles, optimizing code for maintainability, designing idempotent tasks, and implementing branching logic. Emphasis is placed on real-world pipeline reliability and scalability.

Typology: Exams

2025/2026

Available from 01/11/2026

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 85

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
DAG Authoring Practice Exam
**Question 1.** Which class is used to define a DAG in Airflow?
A) airflow.models.BaseOperator
B) airflow.models.DAG
C) airflow.operators.PythonOperator
D) airflow.sensors.BaseSensorOperator
Answer: B
Explanation: The DAG class (`airflow.models.DAG`) is the primary object for defining a
workflow’s structure.
**Question 2.** What does the `schedule_interval` parameter control?
A) When the DAG file is parsed
B) The time between consecutive DAG runs
C) The maximum number of active tasks
D) The default retry delay for tasks
Answer: B
Explanation: `schedule_interval` defines the periodicity of DAG runs (e.g., daily, hourly).
**Question 3.** Which of the following is a preset schedule expression?
A) `@weekly`
B) `cron(0 12 * * *)`
C) `timedelta(hours=2)`
D) `0 0 * * *`
Answer: A
Explanation: `@weekly` is a builtin preset; the others are custom cron or timedelta expressions.
**Question 4.** In Airflow, what is the purpose of `default_args`?
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55

Partial preview of the text

Download DAG Authoring Practice Exam and more Exams Technology in PDF only on Docsity!

Question 1. Which class is used to define a DAG in Airflow? A) airflow.models.BaseOperator B) airflow.models.DAG C) airflow.operators.PythonOperator D) airflow.sensors.BaseSensorOperator Answer: B Explanation: The DAG class (airflow.models.DAG) is the primary object for defining a workflow’s structure. Question 2. What does the schedule_interval parameter control? A) When the DAG file is parsed B) The time between consecutive DAG runs C) The maximum number of active tasks D) The default retry delay for tasks Answer: B Explanation: schedule_interval defines the periodicity of DAG runs (e.g., daily, hourly). Question 3. Which of the following is a preset schedule expression? A) @weekly B) cron(0 12 * * *) C) timedelta(hours=2) D) 0 0 * * * Answer: A Explanation: @weekly is a built‑in preset; the others are custom cron or timedelta expressions. Question 4. In Airflow, what is the purpose of default_args?

A) To set global variables for the scheduler B) To provide default parameters for all tasks in the DAG C) To define the DAG’s execution timezone D) To configure the Airflow webserver Answer: B Explanation: default_args supplies common arguments (e.g., retries, owner) to every task unless overridden. Question 5. Which parameter limits the number of concurrent DAG runs? A) max_active_tasks B) max_active_runs C) concurrency D) pool Answer: B Explanation: max_active_runs caps how many runs of the same DAG can be active simultaneously. Question 6. What does idempotent task design ensure? A) Tasks always run in parallel B) Re‑running a task does not cause duplicate side effects C) Tasks never fail D) Tasks ignore upstream failures Answer: B Explanation: Idempotency means a task can be executed multiple times without changing the final state beyond the first run. Question 7. Which operator is best suited for executing a shell command?

B) Choose one downstream path based on a condition C) Run multiple tasks in parallel automatically D) Push XComs without returning a value Answer: B Explanation: It returns a task_id (or list) that determines which branch continues. Question 11. What is the primary benefit of using a TaskGroup? A) To enforce retry policies across tasks B) To visually group related tasks in the UI C) To automatically parallelize tasks D) To create dynamic task mapping Answer: B Explanation: TaskGroup provides a logical container that improves DAG readability in the UI. Question 12. Which sensor mode releases the worker slot while waiting? A) poke B) reschedule C) timeout D) heartbeat Answer: B Explanation: reschedule mode defers the task, freeing the slot until the next try interval. Question 13. Which hook enables interaction with an Amazon S3 bucket? A) PostgresHook B) S3Hook C) MySqlHook

D) HttpHook Answer: B Explanation: S3Hook provides methods for uploading, downloading, and listing objects in S3. Question 14. How can a task push a value to XCom without explicitly calling xcom_push? A) By returning the value from a Python callable (TaskFlow API) B) By setting push_xcom=False in the operator C) By using Variable.set D) By defining a default_args entry xcom=True Answer: A Explanation: In the TaskFlow API, the return value of a @task function is automatically pushed to XCom. Question 15. Which method retrieves a variable stored in Airflow’s metadata DB? A) Variable.get() B) Connection.get() C) os.getenv() D) settings.get() Answer: A Explanation: Variable.get() fetches a user‑defined variable. Question 16. To avoid heavy computation at import time, where should you place code that creates dynamic tasks? A) At the top‑level of the DAG file B) Inside the DAG context manager (with DAG(...) as dag:) C) Inside a Python callable used by a PythonOperator D) In the default_args dictionary

Explanation: macros.datetime gives a datetime object that can be used in templated fields. Question 20. What does the retries parameter control? A) Number of times a task will be attempted after failure B) Number of parallel tasks allowed in a DAG C) Number of DAG runs retained in the UI D) Number of times the scheduler will reload the DAG file Answer: A Explanation: retries defines how many additional attempts a task gets after an initial failure. Question 21. Which parameter sets a hard limit on how long a task may run? A) execution_timeout B) retry_delay C) dagrun_timeout D) sla Answer: A Explanation: execution_timeout raises a AirflowTaskTimeout exception if the task exceeds the specified duration. Question 22. Where should you define a callback that runs after any task fails? A) on_success_callback in default_args B) on_failure_callback in the operator definition C) sla_miss_callback in the DAG definition D) trigger_rule set to all_failed Answer: B Explanation: on_failure_callback is invoked when a task ends in a failed state.

Question 23. Which construct allows you to fan‑out a list of values into separate task instances? A) TaskGroup B) DynamicTaskMapping (expand) C) SubDagOperator D) BranchPythonOperator Answer: B Explanation: Dynamic Task Mapping (expand or map) creates a task instance for each element of a list/dict at runtime. Question 24. Which Airflow component is responsible for reading DAG files and creating DAG objects? A) Executor B) Scheduler C) Webserver D) Worker Answer: B Explanation: The Scheduler parses DAG files, creates DAG runs, and schedules tasks. Question 25. What is the effect of setting schedule_interval=None? A) The DAG runs continuously without pause B) The DAG is triggered only manually C) The DAG runs every minute D) The DAG inherits the schedule from its parent DAG Answer: B

Explanation: FileSensor polls the filesystem for the existence of a file. Question 29. How does the reschedule mode of a sensor affect the task's state? A) The task stays in running until the condition is met B) The task moves to up_for_reschedule and releases its slot C) The task fails after the first poke attempt D) The task is automatically marked as skipped after timeout Answer: B Explanation: In reschedule mode, the task is marked up_for_reschedule, freeing the worker slot. Question 30. Which Airflow macro provides the start of the data interval for the current run? A) {{ data_interval_start }} B) {{ ds }} C) {{ prev_ds }} D) {{ next_ds }} Answer: A Explanation: data_interval_start is the datetime marking the beginning of the logical period for the run. Question 31. What is the purpose of the pool parameter on a task? A) To limit the number of concurrent DAG runs B) To restrict how many tasks can run simultaneously for a given resource C) To define a reusable set of default arguments D) To group tasks for visual organization Answer: B

Explanation: Pools enforce concurrency limits on tasks that share a common external resource. Question 32. Which of the following statements about SubDagOperator is true? A) It is the recommended way to create dynamic task mapping B) It runs a separate DAG as a sub‑task within the parent DAG C) It automatically retries failed tasks in the sub‑dag D) It cannot be used with the TaskFlow API Answer: B Explanation: SubDagOperator executes another DAG as a single task in the parent DAG. Question 33. In the context of Airflow testing, what does airflow dags test do? A) Validates the DAG file syntax only B) Executes a DAG run locally without a scheduler C) Starts a full Airflow cluster for integration testing D) Generates a DAG diagram as an image file Answer: B Explanation: airflow dags test <dag_id> <execution_date> runs the DAG immediately in the current process. Question 34. Which of the following is a best practice for handling secrets in Airflow? A) Hard‑code them in the DAG file B) Store them in Airflow Variables with plain text C) Use Connections with encrypted passwords or a secret backend D) Pass them via command‑line arguments to the scheduler Answer: C Explanation: Connections can store credentials securely and can be integrated with secret backends (Vault, AWS Secrets Manager).

Question 38. Which method is used to retrieve an XCom value from another task? A) task_instance.xcom_pull(task_ids='my_task') B) Variable.get('my_xcom') C) XCom.get_one(task_id='my_task') D) ti.xcom_push('key', value) Answer: A Explanation: xcom_pull fetches the value; you specify the source task ID and optionally the key. Question 39. In the TaskFlow API, how do you pass data from one task to another? A) Using Variable.set and Variable.get B) Returning a value from the upstream task and receiving it as a function argument in the downstream task C) Using global Python variables D) Writing to a temporary file on the worker Answer: B Explanation: The TaskFlow API automatically pushes the return value to XCom and injects it into the downstream task’s argument. Question 40. Which of the following is true about max_active_tasks in a DAG definition? A) It limits the number of tasks that can run across all DAGs B) It limits the number of tasks that can run concurrently for that specific DAG C) It is deprecated and has no effect in Airflow 2+ D) It only applies to tasks in a TaskGroup Answer: B Explanation: max_active_tasks caps concurrent running tasks for the particular DAG.

Question 41. What does the catchup=False flag do when set on a DAG? A) Prevents the DAG from being scheduled at all B) Skips backfilling of missed intervals when the scheduler starts C) Forces the DAG to run only once D) Enables parallel execution of past runs Answer: B Explanation: catchup=False tells Airflow not to create DAG runs for intervals that were missed. Question 42. Which decorator creates a reusable DAG factory function? A) @dag B) @task C) @operator D) @sensor Answer: A Explanation: @dag decorates a function that returns a DAG object, allowing parameterized DAG creation. Question 43. How can you limit a DAG’s concurrency across the entire Airflow environment? A) Set max_active_runs in the DAG definition B) Define a pool with limited slots and assign the DAG’s tasks to it C) Use concurrency in default_args D) Set dag_concurrency in airflow.cfg Answer: B Explanation: Pools enforce a global slot limit; assigning all tasks to a pool caps overall concurrency.

Question 47. In dynamic task mapping, which method creates the mapped tasks? A) task.expand() B) task.map() C) task.group() D) task.branch() Answer: A Explanation: The expand (or expand_kwargs) method triggers dynamic mapping of a task. Question 48. Which of the following best describes the on_success_callback? A) Runs after a DAG completes successfully B) Runs after an individual task succeeds C) Runs before any task starts D) Runs only when a task is retried Answer: B Explanation: on_success_callback is attached to a task and fires when that task succeeds. Question 49. Which Airflow built‑in preset schedule runs at midnight UTC every day? A) @hourly B) @daily C) @weekly D) @monthly Answer: B Explanation: @daily translates to “0 0 * * *” (midnight UTC). Question 50. Which argument of an operator controls the amount of time to wait between retries?

A) retry_exponential_backoff B) retry_delay C) max_retry_delay D) retry_timeout Answer: B Explanation: retry_delay is a timedelta specifying the pause before each retry attempt. Question 51. What is the effect of setting depends_on_past=True for a task? A) The task will wait for its previous run to complete successfully before executing the current run B) The task will ignore upstream dependencies C) The task will run only once per DAG D) The task will be skipped if any upstream task fails Answer: A Explanation: depends_on_past enforces serial execution across consecutive DAG runs for that task. Question 52. Which of the following is NOT a valid way to pass a templated field to an operator? A) Using {{ ds }} in the bash_command of a BashOperator B) Setting sql="{{ params.my_query }}" in a PostgresOperator C) Providing a Python dictionary directly to templated_fields at runtime D) Using {{ ti.xcom_pull(task_ids='t1') }} in a templated argument Answer: C Explanation: templated_fields is a class attribute defining which parameters are rendered; you cannot set it dynamically per run.

A) Return a dictionary and use @task(multiple_outputs=True) B) Return a list and set multiple_outputs=False C) Use ti.xcom_push manually for each key D) It is not possible; tasks can only return a single value Answer: A Explanation: Setting multiple_outputs=True causes Airflow to split a dict into separate XCom entries. Question 57. Which parameter can be used to limit the number of times a DAG run is retried after failure? A) max_active_runs B) retries (on the DAG level) – not available; retries are task‑level only C) dagrun_retry_delay D) dagrun_timeout Answer: B (Explanation: DAGs themselves do not have a retry count; retries are defined per task.) Question 58. What is the purpose of the airflow.models.baseoperator.BaseOperator class? A) It defines the execution environment for the scheduler B) It provides the common interface and metadata for all operators C) It handles database connections automatically D) It renders Jinja templates for tasks Answer: B Explanation: BaseOperator is the parent class for all operators, containing common attributes like task_id, retries, etc.

Question 59. Which of the following statements about ExternalTaskSensor is true? A) It waits for a file to appear on a remote system B) It pauses a DAG until a task in a different DAG finishes successfully C) It monitors the health of an external HTTP endpoint D) It automatically retries on failure without configuration Answer: B Explanation: ExternalTaskSensor checks the state of a task in another DAG. Question 60. How can you retrieve the Airflow connection URI for a given connection ID inside a task? A) Variable.get('conn_id') B) BaseHook.get_connection('conn_id').get_uri() C) os.getenv('AIRFLOW_CONN_CONN_ID') D) Connection.get('conn_id').uri Answer: B Explanation: BaseHook.get_connection returns a Connection object whose get_uri() method yields the full URI. Question 61. Which of the following is a recommended practice for organizing DAG code in a large project? A) Place all DAGs in a single Python file B) Use a dags/ folder with sub‑folders per domain and keep business logic in separate modules C) Store DAG definitions directly in the Airflow metadata DB D) Encode all configuration values as global variables at the top of each DAG file Answer: B Explanation: Modular organization improves maintainability and reduces parsing time.