If you’re working with location data, geocoding — the process of converting addresses into geographic coordinates — is often a key step. While many APIs offer geocoding, handling large lists of addresses while respecting rate limits can be a challenge.
In this article, we’ll walk through a Python script that solves exactly this problem using Geoapify’s Geocoding API. We’ll read addresses from a file, process them in rate-limited batches, and write the results to a newline-delimited JSON (NDJSON) file.
🔗 GitHub Repository: geoapify/maps-api-code-samples
🧩 What This Script Does
This script:
- Reads a list of addresses from a file.
- Sends asynchronous requests to the Geoapify Geocoding API.
- Respects API rate limits (5 requests/second).
- Optionally filters geocoding by country.
- Saves results to a file in NDJSON format.
Perfect for developers processing large CSV/Excel exports or building internal tools with address lookups.
📥 1. Reading Addresses from File
with open(input_file, 'r') as f:
addresses = f.read().strip().splitlines()
What it does:
Reads the input file line by line, strips extra whitespace, and stores the addresses in a list.
Why it’s needed:
Prepares a clean list of addresses for batch processing. Each line in the input file represents a separate address.
📚 Docs:
🧮 2. Batching Requests According to Rate Limit
addresses = list(it.batched(addresses, REQUESTS_PER_SECOND))
What it does:
Splits the address list into smaller batches, each containing REQUESTS_PER_SECOND
number of addresses (e.g. 5 per batch).
Why it’s needed:
Geoapify enforces a maximum number of API requests per second. Batching ensures we never send more than the allowed number of requests per second.
📚 Docs:
📝 If you're using Python < 3.12, see how to implement your own batching function here.
🚀 3. Asynchronous Execution of Requests
tasks = []
with ThreadPoolExecutor(max_workers=10) as executor:
for batch in addresses:
logger.info(batch)
tasks.extend([executor.submit(geocode_address, address, api_key, country_code) for address in batch])
sleep(1)
What it does:
- Uses a thread pool to send multiple requests in parallel.
- Submits one thread per address.
- Waits 1 second between batches to comply with Geoapify's rate limit.
Why it’s needed:
Parallelism accelerates processing by making multiple requests simultaneously. sleep(1)
ensures the API's request-per-second quota isn’t exceeded.
📚 Docs:
🌍 4. Geocoding Function
def geocode_address(address, api_key, country_code):
params = {
'format': 'json',
'text': address,
'limit': 1,
'apiKey': api_key
}
if country_code:
params['filter'] = 'countrycode:' + country_code
try:
response = requests.get(GEOAPIFY_API_URL, params=params)
if response.status_code == 200:
data = response.json()
if len(data['results']) > 0:
return data['results'][0]
else:
return { "error": "Not found" }
else:
logger.warning(f"Failed to geocode address '{address}': {response_data}")
return {}
except Exception as e:
logger.error(f"Error while geocoding address '{address}': {e}")
return {}
What it does:
Sends a request to the Geoapify Geocoding API with the given address and optional country code.
Parses the response and returns the top geocoding result as a dictionary. If no result is found, or an error occurs, it returns a fallback dictionary with an error message.
Why it’s needed:
Encapsulates the geocoding logic in a reusable function. Handles:
- URL building and query parameters,
- Optional filtering by country for accuracy,
- Error handling and logging,
- Response validation.
📚 Docs:
- Geoapify Geocoding API Docs
- Python
requests.get()
- Python
dict
type - Python
try
/except
- Python
logging
module
⏳ 5. Waiting for All Requests to Complete
wait(tasks, return_when=ALL_COMPLETED)
results = [task.result() for task in tasks]
What it does:
Blocks until all geocoding requests have completed, then collects results into a list.
Why it’s needed:
Ensures that all asynchronous jobs finish before the output is saved. Prevents writing incomplete or partial results.
📚 Docs:
📝 6. Writing Results to NDJSON File
with open(output_file, 'w') as f:
for result in results:
f.write(json.dumps(result) + '\n')
What it does:
Writes results as newline-delimited JSON objects to a file — a format known as NDJSON.
Why it’s needed:
NDJSON is ideal for large-scale processing. It’s readable line-by-line, can be streamed, and integrates well with tools like jq
, Elasticsearch, and data pipelines.
📚 Docs:
▶️ How to Use It
1. Save your addresses to a .txt
file (one per line):
1600 Amphitheatre Parkway, Mountain View, CA
Eiffel Tower, Paris
Brandenburger Tor, Berlin
2. Run the script:
python geocode_addresses.py \
--api_key=YOUR_GEOAPIFY_API_KEY \
--input=addresses.txt \
--output=results.ndjson \
--country_code=us
-
--api_key
: Your Geoapify API key. -
--input
: Input file containing addresses. -
--output
: Output file in NDJSON format. -
--country_code
: Optional ISO country code (e.g.,us
,fr
,de
) to increase accuracy.
🛠️ Requirements
Only Python standard library is used — no extra installs needed.
Python 3.12+ is required for itertools.batched
.
If you’re on an older version, you can define your own batching function:
def batched(iterable, n):
it = iter(iterable)
while True:
batch = list(itertools.islice(it, n))
if not batch:
break
yield batch
📦 Use Case Scenarios
- Clean and validate customer address lists.
- Pre-process logistics/delivery points.
- Enrich event registration data with geo-coordinates.
🔍 Conclusion
With just a few lines of Python, you can build a robust geocoding pipeline that respects API rate limits and scales to thousands of addresses. This script is a great foundation you can extend with features like:
- Retry logic
- Address deduplication
- Integration with Pandas or Google Sheets
Try it yourself — and happy geocoding!
Top comments (0)