Before digging in some technical details I'd like to share some higher level considerations:
The amount of times you need to build your database for a database requirement is low, and it's better double check whether or not an on the shelf solution would be better suited; or a solution built on top of an on the shelf solution. Too many times, rewriting a database is reinventing the wheel for NIH reasons. I speak from experience; I am, for instance, working in a team of engineers that missed this step several times and implemented file databases, home-made orchestrators...
Additionally, to make sure you effectively addressed, it's important to challenge the requirements of the database, in either the persistence span, the response times, write throughput, cluster efficiency, transaction support etc. It's to be understood there is no silver bullet design, as highlighted for example by the cap theorem. If you aim for all features, you will at best correctly have a subset of them, and be lucky if that were the most important one for business reasons.
With that being said:
You essentially need to associate a boolean data to a bunch of object you are persisting. Here, their transactional state, but you could imagine future features allowing you to produce index on the persisted files, or any other content summary that could be handy to have for searching or similar.
That's why I believe the most long-term solution would be an eventually-persisted "index" file, in the simplest form a list of boolean values where the i-th entry is the status of the i-th log. The implementation only have to care that:
- The index is being updated in memory after the persisting of the logs is done and complete
- The index is being stored at regular intervals. The cleanest would be to have at least two files so if a write is interrupted the backup can be used
- The index is loaded at startup and rebuilt if missing or corrupt.
In this implementation, you would be sure the files flagged truly are persisted and can be skipped. The reverse is not guaranteed, but you could live with files being re-writewritten (if this is idempotent) or being checked with a more expensive way when the database is live, as false negatives will be low frequency.