Build a Real-Time Geolocation Analytics API with Tinybird

#analytics #database #api #tutorial

Geolocation data is pivotal for applications that track user movements, analyze spatial patterns, or deliver location-based services. Whether it's for tracking delivery vehicles in real-time, analyzing foot traffic in a retail store, or providing personalized content based on user location, the ability to process and analyze geolocation data efficiently is crucial. In this tutorial, you'll learn how to build a real-time API for processing and analyzing geolocation data using Tinybird. Tinybird is a data analytics backend for software developers. You use Tinybird to build real-time analytics APIs without needing to set up or manage the underlying infrastructure. Tinybird offers a local-first development workflows, git-based deployments, resource definitions as code, and features for AI-native developers. By leveraging Tinybird's data sources and pipes, you can easily ingest, transform, and serve geolocation data through high-performance APIs. This tutorial will guide you through creating data sources to store geolocation events, transforming this data to extract meaningful insights, and finally, deploying scalable APIs for tracking user locations, retrieving user location history, analyzing location data over time, and finding nearby users based on coordinates. Let's dive into the world of real-time geolocation analytics with Tinybird.

Understanding the data

Imagine your data looks like this:

{"user_id": "user_396", "latitude": 2023638.396, "longitude": 2023548.396, "accuracy": 11, "event_type": "departed", "device_id": "device_396", "timestamp": "2025-04-13 09:32:45"}
{"user_id": "user_169", "latitude": 1828226.169, "longitude": 1828136.169, "accuracy": 64, "event_type": "exited_region", "device_id": "device_169", "timestamp": "2025-04-13 19:46:15"}

This data represents location events captured from users, including their IDs, coordinates (latitude and longitude), the accuracy of the location data, the type of event (e.g., check-in, departure), the device ID, and the timestamp of the event. To store this data in Tinybird, you create a data source with the following schema:

DESCRIPTION >
    Stores location events with user IDs, coordinates, and timestamps

SCHEMA >
    `user_id` String `json:$.user_id`,
    `latitude` Float64 `json:$.latitude`,
    `longitude` Float64 `json:$.longitude`,
    `accuracy` Float32 `json:$.accuracy`,
    `event_type` String `json:$.event_type`,
    `device_id` String `json:$.device_id`,
    `timestamp` DateTime `json:$.timestamp`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, user_id, event_type"

This schema is designed to optimize query performance by sorting data by timestamp, user_id, and event_type. The MergeTree engine, combined with appropriate partitioning and sorting keys, ensures efficient data storage and retrieval. To ingest data into this data source, Tinybird's Events API allows you to stream JSON/NDJSON events from your application frontend or backend with a simple HTTP request. The real-time nature of the Events API and its low latency make it ideal for geolocation data:

curl -X POST "https://api.europe-west2.gcp.tinybird.co/v0/events?name=location_events&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV" \
     -H "Authorization: Bearer $TB_ADMIN_TOKEN" \
     -d '{
       "user_id": "user_123",
       "latitude": 40.7128,
       "longitude": -74.0060,
       "accuracy": 15.5,
       "event_type": "check_in",
       "device_id": "device_abc",
       "timestamp": "2023-10-15 14:30:00"
     }'

For event/streaming data, the Kafka connector provides a robust option for integrating with existing Kafka streams. For batch or file-based data, the Data Sources API and S3 connector offer flexible ingestion methods.

Transforming data and publishing APIs

Tinybird transforms data through pipes, allowing for batch transformations, real-time transformations (Materialized views), and the creation of API endpoints.

User Location History

The user_location_history endpoint retrieves a user's location history within a specified time range:

DESCRIPTION >
    Retrieve location history for a specific user within a time range

NODE user_location_history_node
SQL >
    SELECT 
        user_id,
        latitude,
        longitude,
        accuracy,
        event_type,
        device_id,
        timestamp
    FROM location_events
    WHERE 
        user_id = {{String(user_id, '')}}
        AND timestamp >= {{DateTime(start_time, '2023-01-01 00:00:00')}}
        AND timestamp <= {{DateTime(end_time, '2023-12-31 23:59:59')}}
    ORDER BY timestamp DESC
    LIMIT {{Int32(limit, 1000)}}

TYPE endpoint

This pipe filters location events by user_id and a date/time range, orders the results by timestamp, and limits the output. The use of query parameters (user_id, start_time, end_time, limit) makes the API flexible, allowing for tailored queries.

Location Analytics

The location_analytics endpoint aggregates location event data over different time periods:

DESCRIPTION >
    Get analytics on location events by aggregating data over different time periods

NODE location_analytics_node
SQL >
    SELECT 
        {{String(time_bucket, 'hour')}} AS time_bucket,
        CASE 
            WHEN {{String(time_bucket, 'hour')}} = 'hour' THEN toStartOfHour(timestamp)
            WHEN {{String(time_bucket, 'hour')}} = 'day' THEN toStartOfDay(timestamp)
            WHEN {{String(time_bucket, 'hour')}} = 'week' THEN toStartOfWeek(timestamp)
            WHEN {{String(time_bucket, 'hour')}} = 'month' THEN toStartOfMonth(timestamp)
            ELSE toStartOfHour(timestamp)
        END AS bucket_start,
        event_type,
        count() AS event_count,
        count(DISTINCT user_id) AS unique_users,
        avg(accuracy) AS avg_accuracy
    FROM location_events
    WHERE 
        timestamp >= {{DateTime(start_time, '2023-01-01 00:00:00')}}
        AND timestamp <= {{DateTime(end_time, '2023-12-31 23:59:59')}}
        AND event_type = {{String(event_type, '')}}
    GROUP BY 
        time_bucket,
        bucket_start,
        event_type
    ORDER BY bucket_start DESC

TYPE endpoint

This pipe allows for dynamic aggregation based on a specified time_bucket (e.g., hour, day, week, month) and filters events by type and time range. It calculates the count of events, unique users, and average accuracy of location data.

Nearby Users

The nearby_users endpoint finds users within a specified radius of given coordinates:

DESCRIPTION >
    Find users within a specified radius of a given coordinate

NODE nearby_users_node
SQL >
    SELECT 
        user_id,
        latitude,
        longitude,
        event_type,
        timestamp,
        111.111 * 
        SQRT(
            POW(latitude - {{Float64(target_lat, 0.0)}}, 2) + 
            POW(longitude * COS(PI() * latitude / 180.0) - {{Float64(target_lon, 0.0)}} * COS(PI() * {{Float64(target_lat, 0.0)}} / 180.0), 2)
        ) AS distance_km
    FROM location_events
    WHERE 
        timestamp >= {{DateTime(start_time, '2023-01-01 00:00:00')}} AND 
        timestamp <= {{DateTime(end_time, '2023-12-31 23:59:59')}}
        AND 111.111 * 
        SQRT(
            POW(latitude - {{Float64(target_lat, 0.0)}}, 2) + 
            POW(longitude * COS(PI() * latitude / 180.0) - {{Float64(target_lon, 0.0)}} * COS(PI() * {{Float64(target_lat, 0.0)}} / 180.0), 2)
        ) <= {{Float64(radius_km, 1.0)}}
    ORDER BY distance_km ASC
    LIMIT {{Int32(limit, 100)}}

TYPE endpoint

This pipe calculates the distance between a target location and each event's coordinates, filtering results by distance, time, and limiting the number of results. It's an efficient way to find nearby users within a given radius.

Deploying to production

Deploying your project to Tinybird Cloud is straightforward with the tb --cloud deploy command. This command creates production-ready, scalable API Endpoints. Tinybird manages resources as code, which complements CI/CD workflows by allowing for automated deployments and version control of your data analytics pipelines. Securing your APIs is essential, and Tinybird provides token-based authentication to ensure that only authorized requests can access your endpoints. Here's how you might call one of the deployed endpoints:

curl -X GET "https://api.europe-west2.gcp.tinybird.co/v0/pipes/user_location_history.json?token=%24TB_ADMIN_TOKEN&user_id=user_123&start_time=2023-01-01+00%3A00%3A00&end_time=2023-12-31+23%3A59%3A59&limit=500&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV"

Conclusion

In this tutorial, you've built a real-time geolocation analytics API with Tinybird. You've learned how to ingest geolocation data, transform it to extract meaningful insights, and deploy scalable APIs for various geolocation analytics use cases. Tinybird simplifies the process of building and managing real-time data pipelines, allowing you to focus on creating value from your data. Sign up for Tinybird to build and deploy your first real-time data APIs in a few minutes. Tinybird is free to start, with no time limit and no credit card required, enabling you to experiment and scale your data projects efficiently.