We're using black, isort, ruff, and type checkers -- terrific!
Lots of best practices in there.
It makes my life as maintainer / reviewer much easier.
middleware
We've gone pretty far down the "batteries included"
import sqlite3 path already.
But if you start a new project, consider accessing sqlite via
SqlAlchemy.
That's how I always do it, to preserve flexibility for
swapping in a postgres backend as a project matures.
The sqlite quirk about not enforcing FK by default is something
my code simply never has to worry about.
It turns out that is due to SqlAlchemy issuing
PRAGMA foreign_keys = ON; behind the scenes,
so I can get on with standard SQL semantics in the usual way.
Additionally your IDE can offer better support for column renames
and indenting of queries when we inherit from Base and
use the ORM to create queries.
And declaring / verifying types such as INTEGER or BOOL is much
more pleasant in that setting.
Your Sqlite3FK context manager is nice enough.
No need to implement that functionality if you have
a SqlAlchemy session at your disposal, which already
attends to those same tasks.
No need to document or test a new class,
when a well documented and tested library offering the same features
is already available.
Also, prefer to shorten each call site by doing the usual
from pathlib import Path.
No need to see that verbose "pathlib" mentioned by each call.
Similarly for strftime() and now().
Kudos for making the ctor set the connection to None,
so we have an easily read "table of contents" of the valid attributes.
maintenance chores
I feel one of the more predictable maintenance tasks that will
crop up over this project's lifetime is "add column".
There's a bunch of places in the OP codebase where column names
and types are listed.
They're not all consolidated in one place, within a screenful or
two of code.
It seems like, with the current structure,
{add, rename, delete} column tasks will be less pleasant
than they might have been,
and the existing unit tests might not reliably expose
remaining "to-do" methods that should be updated with a new column name.
constructor conveniences
This is very nice, as it supports concise calls from the app layer:
def __init__(self, db_path: str | pathlib.Path) -> None:
However, I feel this is a regrettable type:
_db_path: str | pathlib.Path
Better to assign self._db_path = Path(db_path).
Note that if p is already a Path, then Path(p) simply returns p,
with no harm done.
Also, having a _db_path attribute at the class level and the instance level
seems bad. Prefer to keep track of it strictly at the instance level.
This is fine: _CREATE_HASH_ARCHIVES: str = """CREATE TABLE ...
But I can't imagine there's a need to annotate that constant.
I mean, surely pyright and mypy infer str on their own, right,
without needing a hint?
boolean column
Consider renaming to is_deleted, for consistency with is_dir.
And consider requiring NOT NULL.
Also, if you ever need to pass a security audit I predict that password column
will be trouble.
many COMMITs
.save() does a tiny amount of work, on one row, and immediately issues a COMMIT.
It appears to me that an app might plausibly loop over many entries,
doing a .save() on each of them.
In the interest of speed,
consider making caller (or caller's context manager) responsible
for issuing COMMIT.
Consider offering a vector based Public API,
where you accept a list of entries.
debugs
Now that the code is mature enough for review,
consider deleting the various _ = ... assignments.
They usually begin with
_ = cur.execute( ...,
and seem to be there to support single-step inspection within a debugger.
timezone
The _now() helper returns a "%Y-%m-%d %H:%M:%S" string which omits timezone.
Consider tacking on a 4-digit " +ZZZZ" suffix.
Alternatively, document that it returns a UTC timestamp.
The OP code uses the local timezone setting when producing a timestamp,
but there's no docstring that warns about the result's ambiguity.
If we serialize such a timestamp, say to an RDBMS,
we should also serialize the zone offset, or document that the offset is zero.
That way some future reader can deserialize it and know
which instant in time it corresponds to.
"optional" helper
Expressions like fe.algo.value if fe.algo is not None else None
crop up in a few places.
Consider introducing a trivial helper
that allows you to simplify each call site, making it more concise.
possible None
_fill_archive() does this:
arch.deleted = bool(cast(int, row["deleted"]))
Recommend you precede that with assert arch.
The pyright type checker seems to believe
the archive record can be None at that point.