Geoapify

Posted on May 27

Batch Geocode Addresses in Python with Geoapify API

#python #api #saas #programming

If you’re working with location data, geocoding — the process of converting addresses into geographic coordinates — is often a key step. While many APIs offer geocoding, handling large lists of addresses while respecting rate limits can be a challenge.

In this article, we’ll walk through a Python script that solves exactly this problem using Geoapify’s Geocoding API. We’ll read addresses from a file, process them in rate-limited batches, and write the results to a newline-delimited JSON (NDJSON) file.

🔗 GitHub Repository: geoapify/maps-api-code-samples

🧩 What This Script Does

This script:

Reads a list of addresses from a file.
Sends asynchronous requests to the Geoapify Geocoding API.
Respects API rate limits (5 requests/second).
Optionally filters geocoding by country.
Saves results to a file in NDJSON format.

Perfect for developers processing large CSV/Excel exports or building internal tools with address lookups.

📥 1. Reading Addresses from File

with open(input_file, 'r') as f:
    addresses = f.read().strip().splitlines()

What it does:
Reads the input file line by line, strips extra whitespace, and stores the addresses in a list.

Why it’s needed:
Prepares a clean list of addresses for batch processing. Each line in the input file represents a separate address.

📚 Docs:

🧮 2. Batching Requests According to Rate Limit

addresses = list(it.batched(addresses, REQUESTS_PER_SECOND))

What it does:
Splits the address list into smaller batches, each containing REQUESTS_PER_SECOND number of addresses (e.g. 5 per batch).

Why it’s needed:
Geoapify enforces a maximum number of API requests per second. Batching ensures we never send more than the allowed number of requests per second.

📚 Docs:

📝 If you're using Python < 3.12, see how to implement your own batching function here.

🚀 3. Asynchronous Execution of Requests

tasks = []
with ThreadPoolExecutor(max_workers=10) as executor:
   for batch in addresses:
       logger.info(batch)
       tasks.extend([executor.submit(geocode_address, address, api_key, country_code) for address in batch])
       sleep(1)

What it does:

Uses a thread pool to send multiple requests in parallel.
Submits one thread per address.
Waits 1 second between batches to comply with Geoapify's rate limit.

Why it’s needed:
Parallelism accelerates processing by making multiple requests simultaneously. sleep(1) ensures the API's request-per-second quota isn’t exceeded.

📚 Docs:

🌍 4. Geocoding Function

def geocode_address(address, api_key, country_code):
    params = {
        'format': 'json',
        'text': address,
        'limit': 1,
        'apiKey': api_key
    }
    if country_code:
        params['filter'] = 'countrycode:' + country_code

    try:
        response = requests.get(GEOAPIFY_API_URL, params=params)
        if response.status_code == 200:
            data = response.json()
            if len(data['results']) > 0:
                return data['results'][0]
            else:
                return { "error":  "Not found" }
        else:
            logger.warning(f"Failed to geocode address '{address}': {response_data}")
            return {}
    except Exception as e:
        logger.error(f"Error while geocoding address '{address}': {e}")
        return {}

What it does:
Sends a request to the Geoapify Geocoding API with the given address and optional country code.
Parses the response and returns the top geocoding result as a dictionary. If no result is found, or an error occurs, it returns a fallback dictionary with an error message.

Why it’s needed:
Encapsulates the geocoding logic in a reusable function. Handles:

URL building and query parameters,
Optional filtering by country for accuracy,
Error handling and logging,
Response validation.

📚 Docs:

⏳ 5. Waiting for All Requests to Complete

wait(tasks, return_when=ALL_COMPLETED)
results = [task.result() for task in tasks]

What it does:
Blocks until all geocoding requests have completed, then collects results into a list.

Why it’s needed:
Ensures that all asynchronous jobs finish before the output is saved. Prevents writing incomplete or partial results.

📚 Docs:

📝 6. Writing Results to NDJSON File

with open(output_file, 'w') as f:
    for result in results:
        f.write(json.dumps(result) + '\n')

What it does:
Writes results as newline-delimited JSON objects to a file — a format known as NDJSON.

Why it’s needed:
NDJSON is ideal for large-scale processing. It’s readable line-by-line, can be streamed, and integrates well with tools like jq, Elasticsearch, and data pipelines.

📚 Docs:

▶️ How to Use It

1. Save your addresses to a `.txt` file (one per line):

1600 Amphitheatre Parkway, Mountain View, CA  
Eiffel Tower, Paris  
Brandenburger Tor, Berlin

2. Run the script:

python geocode_addresses.py \
  --api_key=YOUR_GEOAPIFY_API_KEY \
  --input=addresses.txt \
  --output=results.ndjson \
  --country_code=us

--api_key: Your Geoapify API key.
--input: Input file containing addresses.
--output: Output file in NDJSON format.
--country_code: Optional ISO country code (e.g., us, fr, de) to increase accuracy.

🛠️ Requirements

Only Python standard library is used — no extra installs needed.

Python 3.12+ is required for itertools.batched.
If you’re on an older version, you can define your own batching function:

def batched(iterable, n):
    it = iter(iterable)
    while True:
        batch = list(itertools.islice(it, n))
        if not batch:
            break
        yield batch

📦 Use Case Scenarios

Clean and validate customer address lists.
Pre-process logistics/delivery points.
Enrich event registration data with geo-coordinates.

🔍 Conclusion

With just a few lines of Python, you can build a robust geocoding pipeline that respects API rate limits and scales to thousands of addresses. This script is a great foundation you can extend with features like:

Retry logic
Address deduplication
Integration with Pandas or Google Sheets

Try it yourself — and happy geocoding!

DEV Community

Batch Geocode Addresses in Python with Geoapify API

🧩 What This Script Does

📥 1. Reading Addresses from File

🧮 2. Batching Requests According to Rate Limit

🚀 3. Asynchronous Execution of Requests

🌍 4. Geocoding Function

⏳ 5. Waiting for All Requests to Complete

📝 6. Writing Results to NDJSON File

▶️ How to Use It

1. Save your addresses to a `.txt` file (one per line):

2. Run the script:

🛠️ Requirements

📦 Use Case Scenarios

🔍 Conclusion

Top comments (0)

🧩 What This Script Does

📥 1. Reading Addresses from File

🧮 2. Batching Requests According to Rate Limit

🚀 3. Asynchronous Execution of Requests

🌍 4. Geocoding Function

⏳ 5. Waiting for All Requests to Complete

📝 6. Writing Results to NDJSON File

▶️ How to Use It

1. Save your addresses to a .txt file (one per line):

2. Run the script:

🛠️ Requirements

📦 Use Case Scenarios

🔍 Conclusion

1. Save your addresses to a `.txt` file (one per line):