<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Engineering |</title><link>https://www.adityapaliwal.net/tags/data-engineering/</link><atom:link href="https://www.adityapaliwal.net/tags/data-engineering/index.xml" rel="self" type="application/rss+xml"/><description>Data Engineering</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Fri, 11 Oct 2024 00:00:00 +0000</lastBuildDate><image><url>https://www.adityapaliwal.net/media/icon_hu_982c5d63a71b2961.png</url><title>Data Engineering</title><link>https://www.adityapaliwal.net/tags/data-engineering/</link></image><item><title>Mastering Task Dependencies in Apache Airflow</title><link>https://www.adityapaliwal.net/blog/airflow-day5/</link><pubDate>Fri, 11 Oct 2024 00:00:00 +0000</pubDate><guid>https://www.adityapaliwal.net/blog/airflow-day5/</guid><description>&lt;p>Welcome to Day 5! Today we explore defining task dependencies for complex, real-world data workflows.&lt;/p>
&lt;h2 id="basic-dependencies">Basic Dependencies&lt;/h2>
&lt;h3 id="linear-dependencies">Linear Dependencies&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">extract&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">load&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="fan-outfan-in-patterns">Fan-out/Fan-in Patterns&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">download_launches&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">get_pictures&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">download_metadata&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="n">get_pictures&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">download_metadata&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">notify&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="branching">Branching&lt;/h2>
&lt;p>Use &lt;code>BranchPythonOperator&lt;/code> for conditional logic:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">airflow.operators.python&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">BranchPythonOperator&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">choose_path&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="s2">&amp;#34;task_A&amp;#34;&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">condition&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="s2">&amp;#34;task_B&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">branch&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BranchPythonOperator&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">task_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;branch_task&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">python_callable&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">choose_path&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="trigger-rules">Trigger Rules&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Trigger Rule&lt;/th>
&lt;th>Behavior&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>all_success&lt;/code>&lt;/td>
&lt;td>(default) All parent tasks completed successfully&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>all_failed&lt;/code>&lt;/td>
&lt;td>All parent tasks failed&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>all_done&lt;/code>&lt;/td>
&lt;td>All parents done, regardless of state&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>one_failed&lt;/code>&lt;/td>
&lt;td>At least one parent failed&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>one_success&lt;/code>&lt;/td>
&lt;td>At least one parent succeeded&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>none_failed&lt;/code>&lt;/td>
&lt;td>No parents failed (succeeded or skipped)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="sharing-data-with-xcoms">Sharing Data with XComs&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">task_A&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xcom_push&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;data&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">value&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">my_data&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">task_B&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xcom_pull&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">task_ids&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;task_A&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;data&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Note&lt;/strong>: XComs are stored in the metadata database - use for small data only, not large datasets.&lt;/p>
&lt;h2 id="taskflow-api">Taskflow API&lt;/h2>
&lt;p>Simplify Python task chaining:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@task&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">extract&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">data&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@task&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">transform&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">transformed_data&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@task&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">load&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">transformed_data&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Loading data&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>@task&lt;/code> decorator converts each function into an Airflow task automatically.&lt;/p></description></item><item><title>Scheduling DAGs in Airflow</title><link>https://www.adityapaliwal.net/blog/airflow-day3/</link><pubDate>Wed, 09 Oct 2024 00:00:00 +0000</pubDate><guid>https://www.adityapaliwal.net/blog/airflow-day3/</guid><description>&lt;p>Welcome to Day 3! Today, we&amp;rsquo;re diving into scheduling DAGs, an essential component of automating workflows in Airflow.&lt;/p>
&lt;h2 id="scheduling-options">Scheduling Options&lt;/h2>
&lt;h3 id="1-unscheduled-dags">1. Unscheduled DAGs&lt;/h3>
&lt;p>For DAGs that only run when manually triggered:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">dag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DAG&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dag_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;01_unscheduled&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">start_date&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">dt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2019&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">schedule_interval&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">None&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="2-regular-intervals">2. Regular Intervals&lt;/h3>
&lt;p>Use predefined intervals like &lt;code>@daily&lt;/code>, &lt;code>@hourly&lt;/code>, &lt;code>@weekly&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">dag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DAG&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dag_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;03_with_end_date&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">schedule_interval&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;@daily&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">start_date&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">dt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2019&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">end_date&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">dt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2019&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="3-cron-based-intervals">3. Cron-based Intervals&lt;/h3>
&lt;p>For fine-grained control:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">schedule_interval&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;0 0 * * *&amp;#34;&lt;/span> &lt;span class="c1"># Every day at midnight&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="4-frequency-based-intervals">4. Frequency-based Intervals&lt;/h3>
&lt;p>For custom intervals using &lt;code>timedelta&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">dag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DAG&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dag_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;03_frequency_based&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">schedule_interval&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">dt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">timedelta&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">days&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">start_date&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">dt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2019&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="backfilling">Backfilling&lt;/h2>
&lt;p>Control historical runs with the &lt;code>catchup&lt;/code> parameter:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">dag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DAG&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dag_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;09_no_catchup&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">schedule_interval&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;@daily&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">start_date&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">dt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2019&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">catchup&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span> &lt;span class="c1"># Only run future tasks&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="best-practices">Best Practices&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Atomicity&lt;/strong>: Each task should perform a single responsibility&lt;/li>
&lt;li>&lt;strong>Idempotency&lt;/strong>: Tasks should produce the same results when run multiple times&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for Day 4!&lt;/p></description></item><item><title>Crafting Your First Real Airflow DAG</title><link>https://www.adityapaliwal.net/blog/airflow-day2/</link><pubDate>Tue, 08 Oct 2024 00:00:00 +0000</pubDate><guid>https://www.adityapaliwal.net/blog/airflow-day2/</guid><description>&lt;p>Welcome back to Day 2! Today, we&amp;rsquo;ll explore the anatomy of a DAG and get hands-on by writing a simple ETL workflow.&lt;/p>
&lt;h2 id="writing-your-first-workflow-dag">Writing Your First Workflow (DAG)&lt;/h2>
&lt;p>Here&amp;rsquo;s a complete ETL pipeline for an art gallery:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">airflow&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">DAG&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">datetime&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">datetime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">airflow.operators.python&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PythonOperator&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">pandas&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">pd&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">DAG&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dag_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;art_gallery_etl_2024&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">start_date&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">year&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2024&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">month&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">day&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">hour&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">9&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">minute&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">schedule&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;@daily&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">catchup&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_active_runs&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">render_template_as_native_obj&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">dag&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">extract_art_data_callable&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Extracting art piece data from gallery records&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;date_acquired&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;2022-09-15&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;artist&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Vincent van Gogh&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;title&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Starry Night&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;details&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;type&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Painting&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;dimensions&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;73.7 cm x 92.1 cm&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">extract_art_data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PythonOperator&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">task_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;extract_art_data&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">python_callable&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">extract_art_data_callable&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">transform_art_data_callable&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">raw_data&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">transformed_data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">raw_data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;date_acquired&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">raw_data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;artist&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">raw_data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;title&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">raw_data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;details&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;type&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">raw_data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;details&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;dimensions&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">transformed_data&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">transform_art_data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PythonOperator&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">task_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;transform_art_data&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">python_callable&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">transform_art_data_callable&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">op_kwargs&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s2">&amp;#34;raw_data&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;{{ ti.xcom_pull(task_ids=&amp;#39;extract_art_data&amp;#39;) }}&amp;#34;&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">load_art_data_callable&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">transformed_data&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loaded_data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DataFrame&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">transformed_data&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loaded_data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">columns&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;date_acquired&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;artist&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;title&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;art_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;dimensions&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loaded_data&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">load_art_data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PythonOperator&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">task_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;load_art_data&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">python_callable&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">load_art_data_callable&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">op_kwargs&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s2">&amp;#34;transformed_data&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;{{ ti.xcom_pull(task_ids=&amp;#39;transform_art_data&amp;#39;) }}&amp;#34;&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">extract_art_data&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">transform_art_data&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">load_art_data&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="key-components">Key Components&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>DAG&lt;/strong>: Defines the workflow with properties like &lt;code>dag_id&lt;/code>, &lt;code>start_date&lt;/code>, and &lt;code>schedule&lt;/code>&lt;/li>
&lt;li>&lt;strong>Tasks&lt;/strong>: Each task performs a single action (extract, transform, load)&lt;/li>
&lt;li>&lt;strong>Operators&lt;/strong>: Define what kind of task is executed (PythonOperator for Python functions)&lt;/li>
&lt;li>&lt;strong>Dependencies&lt;/strong>: Using &lt;code>&amp;gt;&amp;gt;&lt;/code> to define execution order&lt;/li>
&lt;/ul>
&lt;h2 id="handling-task-failures">Handling Task Failures&lt;/h2>
&lt;p>Airflow excels at handling failures:&lt;/p>
&lt;ul>
&lt;li>View detailed logs from the UI&lt;/li>
&lt;li>Selectively rerun just the failed task&lt;/li>
&lt;li>Successful tasks don&amp;rsquo;t need to be rerun&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for Day 3, where we&amp;rsquo;ll dive into scheduling!&lt;/p></description></item><item><title>Introduction to Data Pipelines with Apache Airflow</title><link>https://www.adityapaliwal.net/blog/airflow-day1/</link><pubDate>Mon, 07 Oct 2024 00:00:00 +0000</pubDate><guid>https://www.adityapaliwal.net/blog/airflow-day1/</guid><description>&lt;p>Welcome to the first post of my 15-day series on learning Apache Airflow! Every day, I&amp;rsquo;ll break down one chapter from the book &lt;em>&amp;ldquo;Data Pipelines with Apache Airflow&amp;rdquo;&lt;/em> by Bas Harenslak &amp;amp; Julian Rutger de Ruiter.&lt;/p>
&lt;h2 id="what-is-apache-airflow">What is Apache Airflow?&lt;/h2>
&lt;p>Apache Airflow is an open-source platform that allows you to orchestrate complex data workflows. It was developed to automate the scheduling and monitoring of tasks involved in data pipelines.&lt;/p>
&lt;h2 id="key-concepts">Key Concepts&lt;/h2>
&lt;h3 id="workflows--dags-directed-acyclic-graphs">Workflows &amp;amp; DAGs (Directed Acyclic Graphs)&lt;/h3>
&lt;ul>
&lt;li>Airflow manages workflows by representing them as DAGs&lt;/li>
&lt;li>A DAG is a collection of tasks executed based on dependencies&lt;/li>
&lt;li>Example: A data pipeline where raw data is extracted, transformed, and loaded into a database&lt;/li>
&lt;/ul>
&lt;h3 id="tasks-in-airflow">Tasks in Airflow&lt;/h3>
&lt;ul>
&lt;li>Each node in a DAG represents a task&lt;/li>
&lt;li>Tasks can be anything from running a Python script to querying a database&lt;/li>
&lt;li>Tasks run in parallel or sequentially based on dependencies&lt;/li>
&lt;/ul>
&lt;h3 id="schedulers">Schedulers&lt;/h3>
&lt;ul>
&lt;li>Airflow has a scheduler that ensures tasks run at the right time&lt;/li>
&lt;li>You can schedule tasks to run at specific intervals (daily, hourly, etc.)&lt;/li>
&lt;/ul>
&lt;h2 id="setting-up-apache-airflow">Setting Up Apache Airflow&lt;/h2>
&lt;h3 id="1-set-up-a-python-environment">1. Set Up a Python Environment&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">python3 -m venv airflow_venv
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">source&lt;/span> airflow_venv/bin/activate
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="2-install-apache-airflow">2. Install Apache Airflow&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">AIRFLOW_VERSION&lt;/span>&lt;span class="o">=&lt;/span>2.6.3
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">PYTHON_VERSION&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="k">$(&lt;/span>python --version &lt;span class="p">|&lt;/span> cut -d &lt;span class="s2">&amp;#34; &amp;#34;&lt;/span> -f &lt;span class="m">2&lt;/span> &lt;span class="p">|&lt;/span> cut -d &lt;span class="s2">&amp;#34;.&amp;#34;&lt;/span> -f 1-2&lt;span class="k">)&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">CONSTRAINT_URL&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;https://raw.githubusercontent.com/apache/airflow/constraints-&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">AIRFLOW_VERSION&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">/constraints-&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">PYTHON_VERSION&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">.txt&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">pip install &lt;span class="s2">&amp;#34;apache-airflow==&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">AIRFLOW_VERSION&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> --constraint &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">CONSTRAINT_URL&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="3-initialize-the-database">3. Initialize the Database&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">airflow db init
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="4-create-a-user">4. Create a User&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">airflow users create &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --username admin &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --firstname FIRST_NAME &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --lastname LAST_NAME &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --role Admin &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --email admin@example.com
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="5-start-airflow">5. Start Airflow&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">airflow webserver --port &lt;span class="m">8080&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">airflow scheduler
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="my-first-dag">My First DAG&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">airflow&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">DAG&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">airflow.operators.python&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PythonOperator&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">datetime&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">datetime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DAG&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;my_first_dag&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">start_date&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2024&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">schedule_interval&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;@daily&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">my_first_task&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Hello, this is my first Airflow task!&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">task&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PythonOperator&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">task_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;print_hello&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">python_callable&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">my_first_task&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dag&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">dag&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Stay tuned for Day 2 where I&amp;rsquo;ll cover Airflow&amp;rsquo;s Architecture in detail!&lt;/p></description></item></channel></rss>