Scheduling DAGs in Airflow

Oct 9, 2024 · 1 min read

Welcome to Day 3! Today, we’re diving into scheduling DAGs, an essential component of automating workflows in Airflow.

Scheduling Options

1. Unscheduled DAGs

For DAGs that only run when manually triggered:

dag = DAG(
    dag_id="01_unscheduled",
    start_date=dt.datetime(2019, 1, 1),
    schedule_interval=None,
)

2. Regular Intervals

Use predefined intervals like @daily, @hourly, @weekly:

dag = DAG(
    dag_id="03_with_end_date",
    schedule_interval="@daily",
    start_date=dt.datetime(2019, 1, 1),
    end_date=dt.datetime(2019, 1, 5),
)

3. Cron-based Intervals

For fine-grained control:

schedule_interval="0 0 * * *"  # Every day at midnight

4. Frequency-based Intervals

For custom intervals using timedelta:

dag = DAG(
    dag_id="03_frequency_based",
    schedule_interval=dt.timedelta(days=3),
    start_date=dt.datetime(2019, 1, 1),
)

Backfilling

Control historical runs with the catchup parameter:

dag = DAG(
    dag_id="09_no_catchup",
    schedule_interval="@daily",
    start_date=dt.datetime(2019, 1, 1),
    catchup=False  # Only run future tasks
)

Best Practices

  • Atomicity: Each task should perform a single responsibility
  • Idempotency: Tasks should produce the same results when run multiple times

Stay tuned for Day 4!

Aditya Paliwal
Authors
Data Engineer
Data Engineer with 4+ years of experience in implementing and deploying end-to-end data pipelines in production environments. Passionate about combining data engineering with cutting-edge machine learning and AI technologies to create intelligent, data-driven products.