DEV Community

Cover image for My First Data Pipeline Project Using Airflow, Docker & Postgres (COVID API Edition)
Mohamed Hussain S
Mohamed Hussain S

Posted on • Edited on

My First Data Pipeline Project Using Airflow, Docker & Postgres (COVID API Edition)

Hey Devs πŸ‘‹,

If you’re starting out in data engineering or curious how real-world data pipelines work, this post is for you.

As an Associate Data Engineer Intern, I wanted to go beyond watching tutorials and actually build a working pipeline β€” one that pulls real-world data daily, processes it, stores it, and is fully containerized.
So I picked something simple but meaningful: global COVID-19 stats.

Here’s a breakdown of what I built, how it works, and what I learned.


πŸ“Š What This Pipeline Does

This mini-project automates the following:

βœ… Pulls daily global COVID-19 stats from a public API
βœ… Uses Airflow to schedule and monitor the task
βœ… Stores the results in a PostgreSQL database
βœ… Runs everything inside Docker containers

It's a beginner-friendly, end-to-end project to get your hands dirty with core data engineering tools.


🧰 The Tech Stack

  • Python β€” for the main fetch/store logic
  • Airflow β€” to orchestrate and schedule tasks
  • PostgreSQL β€” for storing daily data
  • Docker β€” to containerize and simplify setup
  • disease.sh API β€” open-source COVID-19 stats API

βœ… Want this as a .md file to post on Dev.to?

Let me know β€” I can prep and format it for you in one go.
Also happy to help with a LinkedIn summary or visual for carousels if you're planning to cross-post.

βš™οΈ How It Works (Behind the Scenes)

  1. Airflow DAG triggers once per day
  2. A Python script sends a request to the COVID-19 API
  3. Parses the JSON response
  4. Inserts the cleaned data into a PostgreSQL table
  5. Logs everything (success/failure) into Airflow's UI

Everything runs locally via docker-compose β€” one command and you're up and running.


πŸ—‚οΈ Project Structure

airflow-docker/
β”œβ”€β”€ dags/               # Airflow DAG (main logic)
β”œβ”€β”€ scripts/            # Python file to fetch + insert data
β”œβ”€β”€ docker-compose.yaml # Setup for Airflow + Postgres
β”œβ”€β”€ logs/               # Logs generated by Airflow
└── plugins/            # (Optional) Airflow plugins
Enter fullscreen mode Exit fullscreen mode

You can check the full repo here:
πŸ‘‰ GitHub: mohhddhassan/covid-data-pipeline


🧠 Key Learnings

βœ… How to build and run a simple Airflow DAG
βœ… Using Docker to spin up services like Postgres & Airflow
βœ… How Python connects to a DB and inserts structured data
βœ… Observing how tasks are logged, retried, and managed in Airflow

This small project gave me confidence in how the core parts of a pipeline talk to each other.


πŸ” Sample Output from API

Here’s a snippet of the JSON response from the API:

{
  "cases": 708128930,
  "deaths": 7138904,
  "recovered": 0,
  "updated": 1717689600000
}
Enter fullscreen mode Exit fullscreen mode

And here’s a sample SQL insert triggered via Python:

INSERT INTO covid_stats (date, total_cases, total_deaths, recovered)
VALUES ('2025-06-06', 708128930, 7138904, 0);
Enter fullscreen mode Exit fullscreen mode

πŸ”§ What’s Next?

I’m planning to:

🚧 Add deduplication logic (so it doesn’t insert same data daily)
πŸ“Š Maybe create a Streamlit dashboard on top of the database
βš™οΈ Play with sensors, templates, and XComs in Airflow
⚑ Extend the pipeline with ClickHouse for OLAP-style analytics


πŸ“Œ Why You Should Try Something Like This

If you're learning data engineering:

  • Start small, but make it real
  • Use public APIs to practice fetching and storing data
  • Wrap it with orchestration + containerization β€” it’s closer to the real thing

This project taught me way more than passively following courses ever could.


πŸ™‹β€β™‚οΈ About Me

Mohamed Hussain S
Associate Data Engineer Intern
LinkedIn | GitHub


πŸš€ Learning in public, one pipeline at a time.

Top comments (0)