Computing desk
< August 31	<< Aug \| September \| Oct >>	September 2 >

Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

September 1

Estimating nodes in a tree for depth-first search

I need to traverse a large tree with a depth-first search. I need an estimate of the number of nodes to get an idea if it is feasible to do it. What I have in mind is to start the search but at each node, select one edge at random. Keep track of the branching factor at each level. Then repeat this maybe 10⁵ times and get an average branching factor at each level. Then multiply the branching factors to get an estimate of the total number of nodes.

Should this give a reasonable estimate of the total number of nodes? Bubba73 ^{You talkin' to me?} 22:18, 1 September 2025 (UTC)[reply]

If you know nothing about the tree in advance you will still not know, as the branches may go to any huge depth, including those branches you don't go near. Are you going to go full depth 10⁵ times? That would give you some sort of probabilistic idea, but not any certainty. Graeme Bartlett (talk) 10:18, 2 September 2025 (UTC)[reply]

~~Do you have any way of knowing or estimating the total number of nodes? That in conjunction with your "branching factor" might give a better estimate.~~ —scs (talk) 10:48, 2 September 2025 (UTC)[reply]

I think the point of the described sampling method is to obtain an estimate of the total number of nodes, to be used to determine whether a full traversal is feasible. ‑‑Lambiam 14:44, 2 September 2025 (UTC)[reply]

Ah, yes. (And: "D'oh!") The emphasis on depth-first in the title had me assuming that the question was whether specifically depth-first traversal was possible, and the intent was to estimate the depth of the tree. —scs (talk) 21:24, 2 September 2025 (UTC)[reply]

Assume you have a finite tree with a gazillion nodes that is completely linear: each internal node has exactly one child. With your method, you compute, each of these 10⁵ times, the product of the branching factors as being 1^gazillion = 1, which is not a good estimate. ‑‑Lambiam 15:09, 2 September 2025 (UTC)[reply]

I was planning to go different random routes, each as deeply as I could, for enough trials to get an estimate. I picked 10⁵ as an example. I could do more. The tree is largely unknown - I don't know how many edges a node has until I get to it, but the number of edges for each will be between 0 and 8. Very few paths will reach the maximum depth. The object is to get the path that goes to the maximum depth of the tree. Bubba73 ^{You talkin' to me?} 22:58, 2 September 2025 (UTC)[reply]

But if there is only one route, as is the case in a strictly linear tree with a branching factor of 1 at all levels (except for terminal nodes), a probe will produce 1, this being the product of the branching factors along its route. It does not matter if you run 10⁰ probes or 10⁹⁹ probes – the answer is always 1, regardless of the depth of the tree.

Here is a mod to your algorithm. The number is computed in a variable E as you return in a probe backwards along the route from the terminal node to the root (which is easily coded if you use recursion; otherwise you need to fill and keep an array of branching factors). On returning from visiting a terminal node, E is set to the value 1. On returning from a visit to an internal node with b children, E is set to 1 + bE. If the branching factors going down along the random route are b₀, b₁, ..., b_n−1, this computes, for the root node,

E = 1 + b₀ + b₀b_i + ··· + b₀b₁···b_n−1.

In the special case of a tree in which all nodes at the same depth i have the same number b_i of children, this is the exact total number of nodes. ‑‑Lambiam 00:07, 3 September 2025 (UTC)[reply]

You can also compute E directly going forward, using an extra variable P. Initially, set E = 0, P = 1. When visiting a new node (the first of which is the root), set E = E + P and then P = Pb, where b is the number of children of the node. If the node being visited is terminal, we are now done (with this probe). Otherwise, move, randomly, one level down and repeat the process. ‑‑Lambiam 00:16, 3 September 2025 (UTC)[reply]

Thanks, I'll study this. There is certainly not only one route to a terminal node, but there might be only one path that goes to the maximum depth. But if there is more than one path that does get to the maximum depth, I'd like to get all of them. (I expect at most only a few of them. though.) (And I am using recursion.) Bubba73 ^{You talkin' to me?} 01:19, 3 September 2025 (UTC)[reply]

If there are more paths from the root to a given node, the graph is not a tree. To get an unbiased estimate of the number of nodes, you need to know more about the composition of the graph than can be obtained by these random probes. Confluence of paths implies that the method I sketched overestimates the number of notes. An example of a bad case is given by the family of graphs of the form

                  /-->--\ /-->--\ /-->--\       /-->--\
                 o       o       o       o ... o       o ,
                  \-->--/ \-->--/ \-->--/       \-->--/

for which, if there are N nodes, the expected value of E equals 2^N − 1. If the constant branching factor is not specifically 2 but, more generally, b, you get E = (b^N − 1) / (b − 1). ‑‑Lambiam 10:54, 3 September 2025 (UTC)[reply]

If the number q of incoming arcs at a node (its "indegree") is known, you can modify the algorithm by multiplying at each level by b/q instead of b (its "outdegree") – but use q = 1 for the root node instead of q = 0 . Then for a true tree, q = 1 at all nodes, so this then reverts to the tree version above. For the bad cases in my last comment above, q = b at all levels beyond the root, so P remains 1 and the final value of E is the number of nodes. So this works for two cases at extreme ends of a spectrum, one being no node sharing at all (tree) and the other being maximal node sharing at each level. I have not thoroughly analyzed this for new bad cases, but this property is encouraging. It makes it plausible that this will in general give good results for rooted DAGs in which nodes can be assigned a level such that the root is the sole node at level 0 and an arc from a node at level ℓ always goes to some node at level ℓ + 1. ‑‑Lambiam 20:32, 3 September 2025 (UTC)[reply]

It is a tree. I haven't implemented the idea presented here yet, but I have implemented my original idea. I did 10⁶ random paths and gathered data on the depth and branching factor. Then I multiplied the average branching factors together and got a number > 10³⁹ for the estimated number of nodes. I realized that this is wrong because it assumes that each branch of the tree goes to the same depth. Actually most branches terminate pretty quickly. Bubba73 ^{You talkin' to me?} 21:33, 4 September 2025 (UTC)[reply]

You could also test various algorithms on trees generated in such a way that you know their node count. ‑‑Lambiam 23:27, 4 September 2025 (UTC)[reply]

Thanks, I have two ideas to work on. Bubba73 ^{You talkin' to me?} 19:08, 6 September 2025 (UTC)[reply]

Wikipedia:Reference desk/Archives/Computing/2025 September 1

Contents

September 1

Estimating nodes in a tree for depth-first search