26

In order to handle a growing database table, we are sharding on table name. So we could have database tables that are named like this:

table_md5one
table_md5two
table_md5three

All tables have the exact same schema.

How do we use SQLAlchemy and dynamically specify the tablename for the class that corresponds to this? Looks like the declarative_base() classes need to have tablename pre-specified.

There will eventually be too many tables to manually specify derived classes from a parent/base class. We want to be able to build a class that can have the tablename set up dynamically (maybe passed as a parameter to a function.)

1
  • There is an example of sharding by table name in the docs Commented May 21 at 20:13

8 Answers 8

18

OK, we went with the custom SQLAlchemy declaration rather than the declarative one.

So we create a dynamic table object like this:

from sqlalchemy import MetaData, Table, Column

def get_table_object(self, md5hash):
    metadata = MetaData()
    table_name = 'table_' + md5hash
    table_object = Table(table_name, metadata,
        Column('Column1', DATE, nullable=False),
        Column('Column2', DATE, nullable=False)
    )
    clear_mappers()
    mapper(ActualTableObject, table_object)
    return ActualTableObject

Where ActualTableObject is the class mapping to the table.

Sign up to request clarification or add additional context in comments.

4 Comments

This is what I've done in the past, and I'd also like to see a different solution.
@jkmacc @Suman I have added a solution with a custom Base class which might be helpful, depending on the use case.
Is there any builtin solution for sharding tables?
from the docs: "clear_mappers() is not for normal use, as there is literally no valid usage for it outside of very specific testing scenarios." docs.sqlalchemy.org/en/13/orm/…
16

In Augmenting the Base you find a way of using a custom Base class that can, for example, calculate the __tablename__ attribure dynamically:

class Base(object):
    @declared_attr
    def __tablename__(cls):
        return cls.__name__.lower()

The only problem here is that I don't know where your hash comes from, but this should give a good starting point.

If you require this algorithm not for all your tables but only for one you could just use the declared_attr on the table you are interested in sharding.

1 Comment

It is incorrect and throws InvalidRequestError if you try to do this on model definition rather than declarative base class.
9

Because I insist to use declarative classes with their __tablename__ dynamically specified by given parameter, after days of failing with other solutions and hours of studying SQLAlchemy internals, I come up with the following solution that I believe is simple, elegant and race-condition free.

def get_model(suffix):
    DynamicBase = declarative_base(class_registry=dict())

    class MyModel(DynamicBase):
        __tablename__ = 'table_{suffix}'.format(suffix=suffix)

        id = Column(Integer, primary_key=True)
        name = Column(String)
        ...

    return MyModel

Since they have their own class_registry, you will not get that warning saying:

This declarative base already contains a class with the same class name and module name as mypackage.models.MyModel, and will be replaced in the string-lookup table.

Hence, you will not be able to reference them from other models with string lookup. However, it works perfectly fine to use these on-the-fly declared models for foreign keys as well:

ParentModel1 = get_model(123)
ParentModel2 = get_model(456)

class MyChildModel(BaseModel):
    __tablename__ = 'table_child'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    parent_1_id = Column(Integer, ForeignKey(ParentModel1.id))
    parent_2_id = Column(Integer, ForeignKey(ParentModel2.id))
    parent_1 = relationship(ParentModel1)
    parent_2 = relationship(ParentModel2)

If you only use them to query/insert/update/delete without any reference left such as foreign key reference from another table, they, their base classes and also their class_registry will be garbage collected, so no trace will be left.

Comments

8

you can write a function with tablename parameter and send back the class with setting appropriate attributes.

def get_class(table_name):

   class GenericTable(Base):

       __tablename__ = table_name

       ID= Column(types.Integer, primary_key=True)
       def funcation(self):
        ......
   return GenericTable

Then you can create a table using:

get_class("test").__table__.create(bind=engine)  # See sqlachemy.engine

2 Comments

This appears to me to be the most pythonic and effective solution to this question.
I change from accepted answer to this, but time cost on test is 100times ....
4

Try this

import zlib

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, BigInteger, DateTime, String

from datetime import datetime

BASE = declarative_base()
ENTITY_CLASS_DICT = {}


class AbsShardingClass(BASE):

    __abstract__ = True

def get_class_name_and_table_name(hashid):
    return 'ShardingClass%s' % hashid, 'sharding_class_%s' % hashid

def get_sharding_entity_class(hashid):
    """
    @param hashid: hashid
    @type hashid: int
    @rtype AbsClientUserAuth
    """

    if hashid not in ENTITY_CLASS_DICT:
        class_name, table_name = get_class_name_and_table_name(hashid)
        cls = type(class_name, (AbsShardingClass,),
                   {'__tablename__': table_name})
        ENTITY_CLASS_DICT[hashid] = cls

    return ENTITY_CLASS_DICT[hashid]

cls = get_sharding_entity_class(1)
print session.query(cls).get(100)

Comments

1

There is a way to do it with declarative base. You can directly modify the Table opbject mapped to the class.

class MyTableClass(Base)
  __tablename__ = "table"

  @classmethod
  def remap(cls,md5):
    cls.__tablename__ = f"table_{md5}"
    existing_mapper = class_mapper(cls)
    existing_mapper.mapped_table.name = f"table_{md5}"
    existing_mapper.local_table.name = f"table_{md5}"

  #other class methods

Now call the remap function before querying.

MyTableClass.remap(md5)

1 Comment

mapped_table is deprecated, we should use persist_selectable. Also, the existing_mapper.local_table.name = ... is not needed, since that parameter is already set by the persist_selectable method. But thank you for the lead!
0

Instead of using imperative creating Table object, you can use usual declarative_base and make a closure to set a table name as the following:

def make_class(Base, table_name):
    class User(Base):
        __tablename__ = table_name
        id = Column(Integer, primary_key=True)
        name= Column(String)

    return User

Base = declarative_base()
engine = make_engine()
custom_named_usertable = make_class(Base, 'custom_name')
Base.metadata.create_all(engine)

session = make_session(engine)
new_user = custom_named_usertable(name='Adam')
session.add(new_user)
session.commit()
session.close()
engine.dispose()

Comments

0

Below seems to work but I'm new to SQLAlchemy, so I don't know what kind of problems this will cause:

class Base(DeclarativeBase, MappedAsDataclass):

    @classmethod
    def __init_subclass__(cls, **kwargs):
        cls.__tablename__ = cls.__name__.lower()
        super().__init_subclass__(**kwargs)

Atleast you don't need to create with functions

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.