DEV Community

Cover image for Batch Geocode Addresses in Python with Geoapify API
Geoapify
Geoapify

Posted on

Batch Geocode Addresses in Python with Geoapify API

If you’re working with location data, geocoding — the process of converting addresses into geographic coordinates — is often a key step. While many APIs offer geocoding, handling large lists of addresses while respecting rate limits can be a challenge.

In this article, we’ll walk through a Python script that solves exactly this problem using Geoapify’s Geocoding API. We’ll read addresses from a file, process them in rate-limited batches, and write the results to a newline-delimited JSON (NDJSON) file.

🔗 GitHub Repository: geoapify/maps-api-code-samples


🧩 What This Script Does

This script:

  • Reads a list of addresses from a file.
  • Sends asynchronous requests to the Geoapify Geocoding API.
  • Respects API rate limits (5 requests/second).
  • Optionally filters geocoding by country.
  • Saves results to a file in NDJSON format.

Perfect for developers processing large CSV/Excel exports or building internal tools with address lookups.


📥 1. Reading Addresses from File

with open(input_file, 'r') as f:
    addresses = f.read().strip().splitlines()
Enter fullscreen mode Exit fullscreen mode

What it does:
Reads the input file line by line, strips extra whitespace, and stores the addresses in a list.

Why it’s needed:
Prepares a clean list of addresses for batch processing. Each line in the input file represents a separate address.

📚 Docs:

🧮 2. Batching Requests According to Rate Limit

addresses = list(it.batched(addresses, REQUESTS_PER_SECOND))
Enter fullscreen mode Exit fullscreen mode

What it does:
Splits the address list into smaller batches, each containing REQUESTS_PER_SECOND number of addresses (e.g. 5 per batch).

Why it’s needed:
Geoapify enforces a maximum number of API requests per second. Batching ensures we never send more than the allowed number of requests per second.

📚 Docs:

📝 If you're using Python < 3.12, see how to implement your own batching function here.

🚀 3. Asynchronous Execution of Requests

tasks = []
with ThreadPoolExecutor(max_workers=10) as executor:
   for batch in addresses:
       logger.info(batch)
       tasks.extend([executor.submit(geocode_address, address, api_key, country_code) for address in batch])
       sleep(1)
Enter fullscreen mode Exit fullscreen mode

What it does:

  • Uses a thread pool to send multiple requests in parallel.
  • Submits one thread per address.
  • Waits 1 second between batches to comply with Geoapify's rate limit.

Why it’s needed:
Parallelism accelerates processing by making multiple requests simultaneously. sleep(1) ensures the API's request-per-second quota isn’t exceeded.

📚 Docs:

🌍 4. Geocoding Function

def geocode_address(address, api_key, country_code):
    params = {
        'format': 'json',
        'text': address,
        'limit': 1,
        'apiKey': api_key
    }
    if country_code:
        params['filter'] = 'countrycode:' + country_code

    try:
        response = requests.get(GEOAPIFY_API_URL, params=params)
        if response.status_code == 200:
            data = response.json()
            if len(data['results']) > 0:
                return data['results'][0]
            else:
                return { "error":  "Not found" }
        else:
            logger.warning(f"Failed to geocode address '{address}': {response_data}")
            return {}
    except Exception as e:
        logger.error(f"Error while geocoding address '{address}': {e}")
        return {}
Enter fullscreen mode Exit fullscreen mode

What it does:
Sends a request to the Geoapify Geocoding API with the given address and optional country code.
Parses the response and returns the top geocoding result as a dictionary. If no result is found, or an error occurs, it returns a fallback dictionary with an error message.

Why it’s needed:
Encapsulates the geocoding logic in a reusable function. Handles:

  • URL building and query parameters,
  • Optional filtering by country for accuracy,
  • Error handling and logging,
  • Response validation.

📚 Docs:

⏳ 5. Waiting for All Requests to Complete

wait(tasks, return_when=ALL_COMPLETED)
results = [task.result() for task in tasks]
Enter fullscreen mode Exit fullscreen mode

What it does:
Blocks until all geocoding requests have completed, then collects results into a list.

Why it’s needed:
Ensures that all asynchronous jobs finish before the output is saved. Prevents writing incomplete or partial results.

📚 Docs:

📝 6. Writing Results to NDJSON File

with open(output_file, 'w') as f:
    for result in results:
        f.write(json.dumps(result) + '\n')
Enter fullscreen mode Exit fullscreen mode

What it does:
Writes results as newline-delimited JSON objects to a file — a format known as NDJSON.

Why it’s needed:
NDJSON is ideal for large-scale processing. It’s readable line-by-line, can be streamed, and integrates well with tools like jq, Elasticsearch, and data pipelines.

📚 Docs:


▶️ How to Use It

1. Save your addresses to a .txt file (one per line):

1600 Amphitheatre Parkway, Mountain View, CA  
Eiffel Tower, Paris  
Brandenburger Tor, Berlin
Enter fullscreen mode Exit fullscreen mode

2. Run the script:

python geocode_addresses.py \
  --api_key=YOUR_GEOAPIFY_API_KEY \
  --input=addresses.txt \
  --output=results.ndjson \
  --country_code=us
Enter fullscreen mode Exit fullscreen mode
  • --api_key: Your Geoapify API key.
  • --input: Input file containing addresses.
  • --output: Output file in NDJSON format.
  • --country_code: Optional ISO country code (e.g., us, fr, de) to increase accuracy.

🛠️ Requirements

Only Python standard library is used — no extra installs needed.

Python 3.12+ is required for itertools.batched.
If you’re on an older version, you can define your own batching function:

def batched(iterable, n):
    it = iter(iterable)
    while True:
        batch = list(itertools.islice(it, n))
        if not batch:
            break
        yield batch
Enter fullscreen mode Exit fullscreen mode

📦 Use Case Scenarios

  • Clean and validate customer address lists.
  • Pre-process logistics/delivery points.
  • Enrich event registration data with geo-coordinates.

🔍 Conclusion

With just a few lines of Python, you can build a robust geocoding pipeline that respects API rate limits and scales to thousands of addresses. This script is a great foundation you can extend with features like:

  • Retry logic
  • Address deduplication
  • Integration with Pandas or Google Sheets

Try it yourself — and happy geocoding!

Top comments (0)