From 595a5b1b2135d4216131d94fc92d561d9ee32614 Mon Sep 17 00:00:00 2001 From: freghar Date: Thu, 23 Jul 2009 17:25:23 +0200 Subject: [PATCH] article: The reviewing maintainers problem An article about separating changes into several commits / patches so they're more easily reviewable / maintainable. It tries to explain (with examples) why one-big-commit style isn't any better. Signed-off-by: freghar --- html/08_The-reviewing-maintainers-problem.html | 103 +++++++++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 html/08_The-reviewing-maintainers-problem.html diff --git a/html/08_The-reviewing-maintainers-problem.html b/html/08_The-reviewing-maintainers-problem.html new file mode 100644 index 0000000..f00fe8b --- /dev/null +++ b/html/08_The-reviewing-maintainers-problem.html @@ -0,0 +1,103 @@ + + + + + + +Freghar's Gitlog - The reviewing maintainers problem + + + + + +

The reviewing maintainers problem

(should be analogy to dining philosophers / cigarette smokers / sleeping barber problems)

+ +

For those who didn't understand - today's article will be about patches, their maintainability and review process. I'll try to cover the subject more comprehensively, so I pretty much expect a "wall of text". Don't get scared, please.

+
+ +

Really common scenario

Many people tend to forget about documentation. Yet most of them use code comments at least, which has pretty logic explanation:

In-code comments are good to make the code more readable, which is important in more complicated code to express what a developer wanted to do by some function call / allocation / etc. They are mainly for small code hunks in one place in a file.
Commit message is some kind of a global, advanced comment. If used correctly, it can be really powerful (see below) to explain the purpose of several changes across files, a relation between them.
Finally, a text document / pdf / wiki and such sort of stuff is mainly for overall project / subsystem documentation, new API examples, RFC-like stuff and so on.

So a developer probably knows what he's making, has some idea in his head, thus in the short-term, he doesn't need anything but comments, which eliminates the "wtf I wrote this here?" problem.
+Furthermore, most of them have a very little idea about patch maintenance. That means we get a BIG bunch of changes, a lot of new functions, calls, tons of new code with a few comments, no documentation and a commit message "apply this, it works".

Now put yourselves to the role of a maintainer. Imagine you just got email with 1MB patch with "please apply this" comment. You have, however, some responsibility for the product and don't want to blindly apply every patch that comes to you, so the patch needs a review by someone you trust. Now find a good friend who review that 1M patch and who will try to understand how it works. You'd better prepare a truck full of beer for such person.

+
+ +

Heeding the solution

When you have a problem like this, it's best to go to someone who's not involved in IT. It can be your mother, girlfriend or wife. Such people often enlighten you with a new idea when you explain the core of a problem to them in a simplified form. Some problems have sometimes a simple solution, hidden behind the wall of programming complications.
+They will tell you eventually that the best way for a maintainer is to ask the patch author, why did he wrote this, why did he made that or let the author explain the patch structure to the programmer like a master explains things to an apprentice.
+There are, however, several maintainers along the path and explaining it to each of them individually would be ineffective. Here we eventually come to a documentation. An author can write some article about what his patch exactly does, but refering to line numbers in a patch (or pasting sections of it) is a long way to go if the patch is big, moreover - after several hours of reading, everyone would get too bored and forget things which were explained in the beginning.
+More intensive code comments aren't the solution either. The maintainer would have to jump between several files and get confused pretty soon.

The ideal way seems to be in the middle - using commit message together with proper change splits.
+What does that mean - don't use one commit for everything, split your work to several logically divided commits. When a maintainer starts to browse through them, he can see something like "aah yes, he added a line to this makefile for a new SQL update file".
+So a list of commits helps understand the maintainer(s) how the patch was formed from the beginning. They can see author's thoughts being realized, they can more easily see how the patch has evolved on a timeline. From a single change to a complex system.

+
+ +

Problems on the road

+People are lazy and sometimes dumb. They don't realize that proper documentation of their own project / patch is useful even to themselves in the long term. So they make the patch the way they understand it, they don't realize that someone else won't.
+Such things aren't always intentional. Say I want to fix this small error, it should take about 12 lines to change and 3 cans of beer, so I make very little comments, because it's going to be a small fix, easy to read. I can do it within 2 hours (including testing).
+It turns out, however, that I cannot. The bug is a lot bigger than I though, so I continue with fixing, concentrating on the bug, testing my fixes along the way. After eight hours I finally fix it with ~100 lines changed. Now - try to split it. Naah, I'm too lazy for that, so what? It works. Period.
+Tomorrow, I'll have no idea what I did today.

The right solution would be to commit the initial fix (which didn't work), continue in making the rest and commit between trying different ideas where the bug might be. So when you finally fix it, you can simply remove useless commits made before the real reason of the bug was found.
+You will simply have too many commits in the worst scenario, in such a case you can squash them. Splitting one large commit is a lot more complicated. You could possibly use git to soft-reset back and add your changes interactively by hunks, but still - it's never that easy.

+
+ +

Examples

A piece of history from the netfilter kernel subsystem

in one commit

+Add netfilter packet filtering engine
+

in several commits - all those are just commit headers/subjects, each of them has a proper description where needed

+Add nfnetlink layer.
+Add ctnetlink subsystem
+Add properly module refcounting for kernel netlink sockets.
+Core changes required by upcoming nfnetlink_queue code
+Add "nfnetlink_queue" netfilter queue handler over nfnetlink
+Add refcounting and /proc/net/netfilter interface to nfnetlink_queue
+Add new "nfnetlink_log" userspace packet logging facility
+...
+

Now consider which example is more understandable. The one where everything is mixed together wouldn't probably win it.

Another example from the recent Hyper-V implementation into Linux:

+[PATCH 01/54] ecryptfs: clean up attribute mess
+[PATCH 02/54] KOBJECT: remove struct kobj_type from struct kset
+[PATCH 03/54] KOBJECT: remove kobj_set_kset_s as no one is using it anymore
+[PATCH 04/54] kset: add kset_create_and_register function
+[PATCH 05/54] kset: convert fuse to use kset_create
+[PATCH 06/54] kset: convert securityfs to use kset_create
+[PATCH 07/54] kset: convert debugfs to use kset_create
+[PATCH 08/54] kset: convert configfs to use kset_create
+[PATCH 09/54] kset: convert ecryptfs to use kset_create
+[PATCH 10/54] kset: convert main fs kset to use kset_create
+[PATCH 11/54] kset: convert gfs2 to use kset_create
+[PATCH 12/54] kset: convert gfs2 dlm to use kset_create
+[PATCH 13/54] kset: convert dlm to use kset_create
+[PATCH 14/54] kset: convert pci hotplug to use kset_create_and_register
+......
+

Please note that it's not needed to commit each new line into separate commit. The number of commits highly depends on what you implemented / fixed. It can be a single commit for two-line fix, 20 commits for some bigger feature.

+
+ +

The key thing is always brain. Use it. Separate your idea in logically divided parts into commits, like new 20 line function + calls to it into one. Always commit fixes separately, do not try to --amend them to the original commit (in case of a typo fix), that belongs to the final cleanup stage.

Speaking of cleanup, your series of patches shouldn't have any "fix this typo" commits when presented to a maintainer. Imagine that you write something, commit it, then write something else, commit, then realize you did a typo in the first commit, so fix it and make another commit. In the end, you should do an interactive rebase, where you can move the typo fix commit after the one that introduced the typo and change "pick" to "squash" (ie. "include the fix in the previous commit"). That's cleanup made easy.

+
+ +

I hope you got my idea in this article and won't ignore it completely. It's a serious thing. Documentation isn't only "for those fools that can't read the code".

+ +

Freghar

+ + + + + -- 2.11.4.GIT