I have many (~2000) locations with time series data. Each time series has millions of rows. I would like to store these in a Postgres database. My current approach is to have a table for each location time series, and a meta table which stores information about each location (coordinates, elevation etc). I am using Python/SQLAlchemy to create and populate the tables. I would like to have a relationship between the meta table and each time series table to do queries like "select all locations that have data between date A and date B" and "select all data for date A and export a csv with coordinates". What is the best way to create many tables with the same structure (only the name is different) and have a relationship with a meta table? Or should I use a different database design?
Currently I am using this type of approach to generate a lot of similar mappings:
from sqlalchemy import create_engine, MetaData
from sqlalchemy.types import Float, String, DateTime, Integer
from sqlalchemy import Column, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref
Base = declarative_base()
def make_timeseries(name):
class TimeSeries(Base):
__tablename__ = name
table_name = Column(String(50), ForeignKey('locations.table_name'))
datetime = Column(DateTime, primary_key=True)
value = Column(Float)
location = relationship('Location', backref=backref('timeseries',
lazy='dynamic'))
def __init__(self, table_name, datetime, value):
self.table_name = table_name
self.datetime = datetime
self.value = value
def __repr__(self):
return "{}: {}".format(self.datetime, self.value)
return TimeSeries
class Location(Base):
__tablename__ = 'locations'
id = Column(Integer, primary_key=True)
table_name = Column(String(50), unique=True)
lon = Column(Float)
lat = Column(Float)
if __name__ == '__main__':
connection_string = 'postgresql://user:pw@localhost/location_test'
engine = create_engine(connection_string)
metadata = MetaData(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()
TS1 = make_timeseries('ts1')
# TS2 = make_timeseries('ts2') # this breaks because of the foreign key
Base.metadata.create_all(engine)
session.add(TS1("ts1", "2001-01-01", 999))
session.add(TS1("ts1", "2001-01-02", -555))
qs = session.query(Location).first()
print qs.timeseries.all()
This approach has some problems, most notably that if I create more than one TimeSeries the foreign key doesn't work. Previously I've used some work arounds, but it all seems like a big hack and I feel that there must be a better way of doing this. How should I organise and access my data?