The objective of this project is to build a complete data pipeline to extract, transform, store and visualize Kenya's external debt data between 2010 and 2024 using the World Bank’s public data API.
Project requirements
- Python for scripting and data processing
- Pandas for data transformation
- PostgreSQL for data storage
- Grafana for interactive dashboards
- World Bank API as the data source
Steps:
1.Extracting external debt data for Kenya from the World Bank API
import requests
debt_url = "https://api.worldbank.org/v2/country/KE/indicator/DT.DOD.DECT.CD?format=json"
response = requests.get(debt_url )
data = response.json()
records = data[1] # The second element contains the actual data
print(records)
json output for the records
[ { "indicator": { "id": "DT.DOD.DECT.CD", "value": "External debt stocks, total (DOD, current US$)" }, "country": { "id": "KE", "value": "Kenya" }, "countryiso3code": "KEN", "date": "2024", "value": null, "unit": "", "obs_status": "", "decimal": 0 } ]
2.Transforming the dataset
- Extracting the required fields (country,date and value)
df = pd.DataFrame([{
'country': d['country']['value'],
'date': d['date'],
'value': d['value']
} for d in records])
This uses list comprehension to loop through each dictionary d in the records list and extracts a simpler dictionary with the 3 key-value pairs.
- Drop or handle any missing values
df = df.dropna() ##Dropping the missing values
- Ensuring correct formatting and consistency of values
Convert the date column to a datetime value for easier time-series analysis
df['date']=pd.to_datetime(df['date'],format='%Y')
Sample output of our transformed data
3.Loading the transformed data into a PostgreSQL database
- Import required packages
import os
import pandas as pd
from sqlalchemy import create_engine
from dotenv import load_dotenv
- Prepare .env file (Optional but recommended)
Create .env file to store your database credentials
DB_USER=your username
DB_PASS=database password
DB_HOST=database hostname
DB_PORT=database port
DB_NAME=database name
- Get DB credentials from .env
Use python-dotenv to read the .env file securely and fetch variables from the environment.
load_dotenv() # This loads environment variables from .env file
username = os.getenv("DB_USER")
password = os.getenv("DB_PASS")
localhost = os.getenv("DB_HOST")
port = os.getenv("DB_PORT")
dbname = os.getenv("DB_NAME")
- Create SQLAlchemy connection string
Create a connection string using your credentials. The engine is your connection to the PostgreSQL database.
engine = create_engine(f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{dbname}")
- Load the Dataframe into PostgreSQL
Use df.to_sql()
to write your Dataframe to the database
df.to_sql('external_debtKe', con=engine, if_exists='replace', index=False)
Confirming that the data has loaded successfully to the postgres database.
4.Visualize the data using Grafana by connecting to your PostgreSQL database
Connect PostgreSQL to Grafana with your database credentials to enable time series visualization of the data
- Line Chart displaying total external debt from 2010 to 2024
- Bar Chart visualizing the annual change in debt using calculated year-over-year differences
Insights & Observations
Kenya’s external debt has grown every year between 2010 and 2024.
No single year showed a decline, suggesting that debt repayment may be limited or outpaced by new borrowing.
Conclusion
This project demonstrates how open data APIs, Python, SQL, and Grafana can be combined into a real-world data engineering solution. By building a full pipeline, we gained insights not just into Kenya’s debt trajectory, but also into the technical processes required to turn raw data into meaningful insights.
Top comments (0)