Mastering Task Dependencies in Apache Airflow

Oct 11, 2024 · 1 min read

Welcome to Day 5! Today we explore defining task dependencies for complex, real-world data workflows.

Basic Dependencies

Linear Dependencies

extract >> transform >> load

Fan-out/Fan-in Patterns

download_launches >> [get_pictures, download_metadata]
[get_pictures, download_metadata] >> notify

Branching

Use BranchPythonOperator for conditional logic:

from airflow.operators.python import BranchPythonOperator

def choose_path():
    return "task_A" if condition else "task_B"

branch = BranchPythonOperator(
    task_id='branch_task',
    python_callable=choose_path
)

Trigger Rules

Trigger RuleBehavior
all_success(default) All parent tasks completed successfully
all_failedAll parent tasks failed
all_doneAll parents done, regardless of state
one_failedAt least one parent failed
one_successAt least one parent succeeded
none_failedNo parents failed (succeeded or skipped)

Sharing Data with XComs

task_A.xcom_push(key='data', value=my_data)
task_B.xcom_pull(task_ids='task_A', key='data')

Note: XComs are stored in the metadata database - use for small data only, not large datasets.

Taskflow API

Simplify Python task chaining:

@task
def extract():
    return data

@task
def transform(data):
    return transformed_data

@task
def load(transformed_data):
    print("Loading data")

The @task decorator converts each function into an Airflow task automatically.

Aditya Paliwal
Authors
Data Engineer
Data Engineer with 4+ years of experience in implementing and deploying end-to-end data pipelines in production environments. Passionate about combining data engineering with cutting-edge machine learning and AI technologies to create intelligent, data-driven products.