added 45 characters in body

Source Link

edited Jul 20 at 21:39

42.3k
3
38
157

It appears to me that an app might plausibly loop over many entries, doing a .save() on each of them. In the interest of speed, consider making caller (or caller's context manager) responsible for issuing a single COMMIT. Consider offering a vector based Public API, where you accept a list of entries that are all committed together as one big transaction.

It appears to me that an app might plausibly loop over many entries, doing a .save() on each of them. In the interest of speed, consider making caller (or caller's context manager) responsible for issuing COMMIT. Consider offering a vector based Public API, where you accept a list of entries.

It appears to me that an app might plausibly loop over many entries, doing a .save() on each of them. In the interest of speed, consider making caller (or caller's context manager) responsible for issuing a single COMMIT. Consider offering a vector based Public API, where you accept a list of entries that are all committed together as one big transaction.

added 411 characters in body

Source Link

edited Jul 20 at 19:13

J_H

42.3k
3
38
157

automated tests

def test_repository() is very nice. But it's a little on the big side. Its three loops suggest that perhaps we'd like three smaller test functions which can individually signal Red or Green status.

If you inherit from unittest.TestCase you can take advantage of def setUp(self). And pytest offers its own approach to consolidating shared setup logic in a single place.

automated tests

def test_repository() is very nice. But it's a little on the big side. Its three loops suggest that perhaps we'd like three smaller test functions which can individually signal Red or Green status.

If you inherit from unittest.TestCase you can take advantage of def setUp(self). And pytest offers its own approach to consolidating shared setup logic in a single place.

Source Link

answered Jul 20 at 19:03

J_H

42.3k
3
38
157

We're using black, isort, ruff, and type checkers -- terrific! Lots of best practices in there. It makes my life as maintainer / reviewer much easier.

middleware

We've gone pretty far down the "batteries included" import sqlite3 path already. But if you start a new project, consider accessing sqlite via SqlAlchemy. That's how I always do it, to preserve flexibility for swapping in a postgres backend as a project matures.

The sqlite quirk about not enforcing FK by default is something my code simply never has to worry about. It turns out that is due to SqlAlchemy issuing PRAGMA foreign_keys = ON; behind the scenes, so I can get on with standard SQL semantics in the usual way.

Additionally your IDE can offer better support for column renames and indenting of queries when we inherit from Base and use the ORM to create queries. And declaring / verifying types such as INTEGER or BOOL is much more pleasant in that setting.

Your Sqlite3FK context manager is nice enough. No need to implement that functionality if you have a SqlAlchemy session at your disposal, which already attends to those same tasks. No need to document or test a new class, when a well documented and tested library offering the same features is already available.

Also, prefer to shorten each call site by doing the usual from pathlib import Path. No need to see that verbose "pathlib" mentioned by each call. Similarly for strftime() and now().

Kudos for making the ctor set the connection to None, so we have an easily read "table of contents" of the valid attributes.

maintenance chores

I feel one of the more predictable maintenance tasks that will crop up over this project's lifetime is "add column".

There's a bunch of places in the OP codebase where column names and types are listed. They're not all consolidated in one place, within a screenful or two of code. It seems like, with the current structure, {add, rename, delete} column tasks will be less pleasant than they might have been, and the existing unit tests might not reliably expose remaining "to-do" methods that should be updated with a new column name.

constructor conveniences

This is very nice, as it supports concise calls from the app layer:

    def __init__(self, db_path: str | pathlib.Path) -> None:

However, I feel this is a regrettable type:

    _db_path: str | pathlib.Path

Better to assign self._db_path = Path(db_path). Note that if p is already a Path, then Path(p) simply returns p, with no harm done.

Also, having a _db_path attribute at the class level and the instance level seems bad. Prefer to keep track of it strictly at the instance level.

This is fine: _CREATE_HASH_ARCHIVES: str = """CREATE TABLE ...
But I can't imagine there's a need to annotate that constant. I mean, surely pyright and mypy infer str on their own, right, without needing a hint?

boolean column

Consider renaming to is_deleted, for consistency with is_dir. And consider requiring NOT NULL.

Also, if you ever need to pass a security audit I predict that password column will be trouble.

many COMMITs

.save() does a tiny amount of work, on one row, and immediately issues a COMMIT.

It appears to me that an app might plausibly loop over many entries, doing a .save() on each of them. In the interest of speed, consider making caller (or caller's context manager) responsible for issuing COMMIT. Consider offering a vector based Public API, where you accept a list of entries.

debugs

Now that the code is mature enough for review, consider deleting the various _ = ... assignments. They usually begin with
_ = cur.execute( ..., and seem to be there to support single-step inspection within a debugger.

timezone

The _now() helper returns a "%Y-%m-%d %H:%M:%S" string which omits timezone. Consider tacking on a 4-digit " +ZZZZ" suffix. Alternatively, document that it returns a UTC timestamp. The OP code uses the local timezone setting when producing a timestamp, but there's no docstring that warns about the result's ambiguity.

If we serialize such a timestamp, say to an RDBMS, we should also serialize the zone offset, or document that the offset is zero. That way some future reader can deserialize it and know which instant in time it corresponds to.

"optional" helper

Expressions like fe.algo.value if fe.algo is not None else None crop up in a few places. Consider introducing a trivial helper that allows you to simplify each call site, making it more concise.

possible None

_fill_archive() does this:

        arch.deleted = bool(cast(int, row["deleted"]))

Recommend you precede that with assert arch. The pyright type checker seems to believe the archive record can be None at that point.

Stack Exchange Network

Return to Answer

automated tests

automated tests

middleware

maintenance chores

constructor conveniences

boolean column

many COMMITs

debugs

timezone

"optional" helper

possible None