1

I want to retrieve commit tree of some repo using only git log output. To get log I use following command:

git log --parents --all --source --numstat

My output looks like (just omit unnecessary metadata):

commit e32c46de36343a0cdad2eac18b5167c0a2831f4d 55dae2809b9e8484ab2466adb6cbed0b1a48fbc9 c070bfc4ed1610d12a1500e307f1323ce9f91653 refs/origin/some_branch
Date:   <commit date>

commit 6d5b6ed00daea7abbb1643cbdd6d2c9d12b5c10a eb539e82860c8c56d18a57e1121d691484aa62cf refs/tags/one_more_tag
Date:   <commit date>

What should be the algorithm to correctly retrieve commit tree?

4
  • Just so we're on the same page here, what do you mean by "commit tree"? i.e. what output are you aiming to achieve? Commented Nov 24, 2017 at 18:43
  • @OliverCharlesworth a mean "commit tree" as commits graph (like git log --graph output) Commented Nov 24, 2017 at 18:45
  • 2
    Ok, so what is wrong with git log --graph? Commented Nov 24, 2017 at 18:47
  • Just so you know, the "commit tree" in git refers to the tree that the commit has associated with it. The "tree" in this case would be the state of the files that belong to that commit. As opposed to the tree of commits, which you seem to be looking for. Commented Mar 13, 2024 at 15:03

1 Answer 1

4

The %P style parent hashes from git log give you the outgoing arcs for the graph, with each commit's hash (%H) giving you the node ID for each vertex. Using git rev-list --parents HEAD would give you more or less the minimal input needed to construct a graph:

89ea799ffcc5c8a0547d3c9075eb979256ee95b8 3505ddecbdd4a2eaf3d2aaea746dc18d7a0b6a6b 5a1f5c3060427375de30d609d72ac032516be4c2
3505ddecbdd4a2eaf3d2aaea746dc18d7a0b6a6b e539a834555583a482c780b11927b885f8777e90
e539a834555583a482c780b11927b885f8777e90 36d75581a4966f5d56403d1a01d60e4dc45a8ab0 00ec50e56d136de41f43c33f39cdbee83f3e4458
36d75581a4966f5d56403d1a01d60e4dc45a8ab0 5066a008bb6a810f908c020de02a646cf3c92f34 049e64aa5024a9fa2fbe6116d413da1ffc7f497a
...

Constructing the graph is now trivial: if you're into graph algorithms, you can see that the above is your G = <V, E> set right there. In other words, you're already done, you have the set G. The first column is all the vertices and the second and later columns are all the outgoing arcs.

Drawing the graph, however, is potentially much harder, depending on what problem(s) you want to solve. If git log --graph (perhaps with --oneline) does not do it for you, you will need to be more specific.

Sign up to request clarification or add additional context in comments.

7 Comments

Thank u for answer. Actually I want to know moment when new branch is created and merged to another. I tried build a graph and got problem: lets consider two commits: c070bf 55dae 97e2a and e324g 55dae c070bf - as I'm on right way, after first commit we have head at c070bf (after merge to 55dae) and second is merge of c070bf to 55dae again. I dont understand how one can merge to the same hash twice(
Those two sets of values—<c070bf 55dae 97e2a>; <e324g 55dae c070bf>—say that c070bf has 55dae and 97e2a as parents (55dae and 97e2a both point to c070bf so c070bf is a merge of the other two), and e324g has 55dae and c070bf as parents (so e324g is a merge of 55dae and c070bf, which is itself a merge).
Incidentally, I'd add that e324g is not a valid Git hash (the g gives it away).
e324g is just example, original tag does not contain g of course). Representation of git log as graph G = <V, E> helps me, thanks for it, but now I'm looking for an algorithm to obtain all commits in particular branch - a path from v1 to v2 where v1 and v2 have the same ref. Dijkstra-like algorithm for all paths from v1 to v2 gives many results and I dont know how distinguish them. Do u have any ideas?
I'm not sure what you mean by "have the same ref". If two vertices have the same node ID, they're really just one, not two vertices, and there's no need for any path at all. If you follow outgoing arcs from v1 leading to v2, though, there can indeed be multiple paths, and in normal usage there's nothing to distinguish: all are valid paths; all these commits are "between" the two commits. Git adds one guarantee, which is that the first parent of each commit node (ie the first outgoing arc from each child to its first parent) is the commit that was current during a merge. For non-merge [cont]
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.