1

We are storing our configuration files in S3. Each file is a JSON file. When config changes, the old file is backed up and a new file replaces it. These happen behind a service, where it also maintains optimistic locks to deal with concurrent writes. We store changes as audit logs, along with the author and timestamp.

The current setup has been working reasonably well for us, but we are seeing some issues with audit logs when we have more engineers and more changes. It's hard to find out a field was last edited by whom, and when.

Since we use Git and its blame feature often as engineers, I feel that it fits the use case nicely. Every change to the config can be modelled as a commit. We can easily see the file history, who has changed which lines, and more.

Now the problems I see is that we have to deal with Git on the CLI with the plumbing commands, which can be difficult to work with. I don't think merge conflicts are likely, but I admit I don't know well enough to say they definitely won't happen when used this way.

Are there alternatives that achieve this subset functions of Git blame with less complexity? Do you recommend this approach or not, and why? What are the best practices?


Edit to add file and usage metrics

We roughly have the number of files in the ~50 region, and they stay stable unless we spin up new services which is not often.

Each file can have 10 lines to a few thousands depending on the complexity of the configs. It's currently sorted by keys (we do have schemas for the configs and the serialization library always makes sure to produce a consistent order and formatting).

There are both engineers and automated systems making changes to the configs, and it can be a few tens of edits per day.

4
  • Welcome. If you want to blame, you need a version control system. One alternative could be to find out whether the S3 service itself is versioning the files instead of just backing them up and replacing them. Perhaps versioning is something that can be activated as an option. Commented Feb 24, 2023 at 7:27
  • Give us a bigger picture: how many files (2, 20, 2000)? How large are they (5 lines, 50 lines, 5000 lines), what are they good for, how many people are changing them how frequently (roughly - once a day, once a month, once a year)? Commented Feb 24, 2023 at 12:09
  • If you can enforce some canonical order on json objects (e.g. valid json, objects have unique keys sorted in C locale order) then merge conflicts are very unlikely. Commented Feb 27, 2023 at 19:22
  • I added the stats to give a clearer picture. Commented Mar 2, 2023 at 5:54

3 Answers 3

3

Json is in essence a tree format. Tracking changes as changes to lines is awkward: lines could move without the meaning of the tree changing.

What would happen if you tracked changes in accordance with this tree nature?

There are still lots of decisions to make if you would go down this path.

  • What is the scope of a change? What is identity within a tree? More concrete: if someone changes the value of a property, did they change the object? Did they change the object containing that object? And so on upto the root.
    As with identity and copying, it depends on what domain object the tree is modeling.
  • What are the operations for changing (from which you calculate diffs)?
    • Only add and remove?
    • Add, remove, and inner change (for changes to a nested object)?
    • Changes to arrays: insert, remove, shift, rotate?
1

There are several options for an "embedded Git", depending on your tech stack. Check this out for example: https://git-scm.com/book/en/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-Libgit2

Then, when you update your config file, you would pull-modify-commit-push in one go and hope for no conflicts. And you can still force-push to have a latest-wins strategy on your config.

1

I don't think merge conflicts are likely, but I admit I don't know well enough to say they definitely won't happen when used this way.

Merge conflicts only happen when you have merges. Merges only happen when you have non-linear development, normally either because you have multiple people involved or because each person is using more than one branch.

If a single process is taking a linear "audit log" and converting it to git commits then there will never be merges and hence will never be merge conflicts.

Now the problems I see is that we have to deal with Git on the CLI with the plumbing commands

Don't over-design this. Sure you could probablly make it more efficient by using plumbing commands to build the commit, but there really isn't any need, just copy the files to the git repo and call git add and git commit with appropriate parameters.

Do you recommend this approach or not, and why?

IMO the big question is how "stable" are your config files. If they are stable with a small change in the configuration mapping to a small change in the file then git is likely fine. If they are unstable with large spurious changes (for example a change in the order of items in the file) when a file is re-saved, then git blame is likely to be much less useful.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.