5
\$\begingroup\$

Context

I'm a duck retailer. I rent ducks to farmers, and when they don't need them anymore, I get them back. In order to track which ducks I rented to whom, I have a PostGreSQL database that I keep updated with my operations.

When I record a rent operation I need to manipulate two tables: Ducks and RelationDuckFarmer.
Relevant attributes of Ducks in the rent context are id and duck_status (to check whether Ducky is healthy or has an issue of some kind).
Attributes at stake in RelationDuckFarmer are duck_id, farm_id, farmer_id (there may be several farmers in the same structure) and finally start_date and end_date (which delineate the renting period). When the rental is over, instances in RelationDuckFarmer are not deleted, we add an end_date instead (so as to keep an history of rents).

Rental conditions

  • Duck must be available & healthy for renting -> duck_status == Available
  • Duck must not be already allocated -> if start_date is not null then end_date must not be either.
  • start_date and end_date are timestamps generated by PostGreSQL CURRENT_TIMESTAMP keyword.

Code

def allocate_ducks(self, ducks, farm_id, farmer_id):
        
        keys = ['farm_id', 'duck_id', 'farmer_id', start_date]
        # Wrapper for SQL transaction
        with transaction() as (conn, cur):
            # Iterate over the given array of duck IDs
            for d in ducks:
                row_data = [farm_id, d, farmer_id]

                if duck_exists(cur, d) and duck_is_available(cur, d):
                    if duck_is_already_rented(cur, d):
                        raise DuckAlreadyRented(d)
                    else:
                        allocate_to_farm = sql.SQL('INSERT INTO {} ({}) VALUES ({}, CURRENT_TIMESTAMP)').format(
                            sql.Identifier('RelationDuckFarm'),
                            sql.SQL(', ').join(map(sql.Identifier, keys)),
                            sql.SQL(', ').join(map(sql.Literal, row_data))
                        )
                        cur.execute(allocate_to_farm)
                        
                        change_status_to_rented = sql.SQL('UPDATE {} SET {}={} WHERE {}={}').format(
                            sql.Identifier('Ducks'),
                            sql.Identifier('duck_status'),
                            sql.Literal('Rented'),
                            sql.Identifier(id),
                            sql.Literal(d)
                        )
                        cur.execute(change_status_to_rented)
                else:
                    raise DuckDoesNotExist(d)
            return ducks

duck_exists:

def duck_exists(self, cur, duck_id):
    query = sql.SQL('SELECT COUNT(*) FROM {} WHERE {}={};').format(
        sql.Identifier('Ducks'),
        sql.Identifier('id'),
        sql.Literal(duck_id)
    )
    cur.execute(query)
    return cur.fetchone()

duck_is_available:

    def duck_is_available(self, cur, duck_id):
    query = sql.SQL('SELECT COUNT(*) FROM {} WHERE {}={} AND {}={}').format(
        sql.Identifier('Ducks'),
        sql.Identifier('id'),
        sql.Literal(duck_id),
        sql.Identifier('duck_status'),
        sql.Literal('Available')
    )
    cur.execute(query)
    return cur.fetchone()

duck_is_already_rented:

def duck_is_already_allocated(self, cur, duck_id):
    # Look for rows where the duck has a rental starting date but not an ending date (which would mean it's currently rented to s.o.)
    query = sql.SQL('SELECT COUNT(*) FROM {} WHERE {}={} AND {} IS NOT NULL AND {} IS NULL').format(
        sql.Identifier('RelationDuckFarm'),
        sql.Identifier('duck_id'),
        sql.Literal(duck_id),
        sql.Identifier('start_date'),
        sql.Identifier('end_date')
    )
    cur.execute(query)
    return cur.fetchone()

Issues

  • The functions I use to check whether it is possible to rent Ducky before actually creating a row in RelationDuckFarm already cost me three SQL queries. I can't really combine them in one because I need these checkers somewhere else as well. So that means I have to perform 5 different queries to rent one duck. Functioning, but not very efficient.
  • I use a for loop to go through the array containing all the ducks I'd like to rent. Which means that if I have n ducks, I'd then need 5*n queries to record my operations.

Question

How to refactor this code to optimize my rental system while still respecting the data model?

\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Query formatting

query = sql.SQL('SELECT COUNT(*) FROM {} WHERE {}={} AND {} IS NOT NULL AND {} IS NULL').format(
    sql.Identifier('RelationDuckFarm'),
    sql.Identifier('duck_id'),
    sql.Literal(duck_id),
    sql.Identifier('start_date'),
    sql.Identifier('end_date'))

You're using format to place constants into the query. Contants like the table name. Why? It makes the query harder to read, and I can imagine it hurts performance (ever so slightly).

query = sql.SQL("""
SELECT COUNT(*) 
FROM RelationDuckFarm 

WHERE duck_id={} 
AND start_date IS NOT NULL 
AND end_date IS NULL
""").format(
    sql.Literal(duck_id))

Duplicate queries

Before you update the table, you perform 3 checks: Does the duck exist, is the duck available and is the duck not already rented? You could combine the first two checks into one query: if the duck doesn't exist, the second query will also return 0, because it has the same where-clause and then some. Additionally, when you rent out a duck, you also set that duck's status to 'rented'. This means that ducks that are already rented out, will never pass the first filter (because their status is not 'available'.

This means that your call to duck_is_available does everything you need to know. This does assume that you correctly set the duck's status back to available when the rent is over! I can't check that because you didn't provide the code. At the same time, if it's not consistently updated, why is it even there to begin with?

Combining queries

If the check above isn't sufficient, you can still combine the queries into one. I think that it's better to have two queries at different places that do what you need, than to split queries into small chuncks, forcing you to execute three every time.

select 

case when d.duck_status = 'available' then 1 else 0 end IsAvailable,
case when rdf.start_date IS NOT NULL AND rdf.end_date IS NULL then 1 else 0 end IsAlreadyRented

from Ducks d

left join RelationDuckFarm rdf on d.duck_id = rdf.duck_id

where d.duck_id = {}

This does join (and filter) on duck_id. Duck_id is a PK/index right?

Errors

You raise a DuckDoesNotExistError, even when a duck exists, but is not available. I'm not sure if you really need that distinction in errors.

All updates in one go:

What is your desired behaviour if some ducks exist, and some don't? Will all changes that have been made be rolled back? If so, you can probably update for all ducks at once after you checked whether the ducks were available.

Both the update and the insert statement can be adapted to add/update multiple rows at once. I'm not familiar with the database API you use in Python, so I can't help you much with that, but it shouldn't be too hard.

\$\endgroup\$
1
  • \$\begingroup\$ You're right, I set the duck's status back when I de-allocate. I didn't want to put too much code in the question body. \$\endgroup\$ Commented Jul 9, 2019 at 7:34

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.