Wikipedia:Reference desk/Archives/Computing/2025 September 1

Computing desk
< August 31 << Aug | September | Oct >> September 2 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


September 1

edit
edit

I need to traverse a large tree with a depth-first search. I need an estimate of the number of nodes to get an idea if it is feasible to do it. What I have in mind is to start the search but at each node, select one edge at random. Keep track of the branching factor at each level. Then repeat this maybe 105 times and get an average branching factor at each level. Then multiply the branching factors to get an estimate of the total number of nodes.

Should this give a reasonable estimate of the total number of nodes? Bubba73 You talkin' to me? 22:18, 1 September 2025 (UTC)[reply]

If you know nothing about the tree in advance you will still not know, as the branches may go to any huge depth, including those branches you don't go near. Are you going to go full depth 105 times? That would give you some sort of probabilistic idea, but not any certainty. Graeme Bartlett (talk) 10:18, 2 September 2025 (UTC)[reply]
Do you have any way of knowing or estimating the total number of nodes? That in conjunction with your "branching factor" might give a better estimate.scs (talk) 10:48, 2 September 2025 (UTC)[reply]
I think the point of the described sampling method is to obtain an estimate of the total number of nodes, to be used to determine whether a full traversal is feasible.  ​‑‑Lambiam 14:44, 2 September 2025 (UTC)[reply]
Ah, yes. (And: "D'oh!") The emphasis on depth-first in the title had me assuming that the question was whether specifically depth-first traversal was possible, and the intent was to estimate the depth of the tree. —scs (talk) 21:24, 2 September 2025 (UTC)[reply]
Assume you have a finite tree with a gazillion nodes that is completely linear: each internal node has exactly one child. With your method, you compute, each of these 105 times, the product of the branching factors as being 1gazillion = 1, which is not a good estimate.  ​‑‑Lambiam 15:09, 2 September 2025 (UTC)[reply]
I was planning to go different random routes, each as deeply as I could, for enough trials to get an estimate. I picked 105 as an example. I could do more. The tree is largely unknown - I don't know how many edges a node has until I get to it, but the number of edges for each will be between 0 and 8. Very few paths will reach the maximum depth. The object is to get the path that goes to the maximum depth of the tree. Bubba73 You talkin' to me? 22:58, 2 September 2025 (UTC)[reply]
But if there is only one route, as is the case in a strictly linear tree with a branching factor of 1 at all levels (except for terminal nodes), a probe will produce 1, this being the product of the branching factors along its route. It does not matter if you run 100 probes or 1099 probes – the answer is always 1, regardless of the depth of the tree.
Here is a mod to your algorithm. The number is computed in a variable E as you return in a probe backwards along the route from the terminal node to the root (which is easily coded if you use recursion; otherwise you need to fill and keep an array of branching factors). On returning from visiting a terminal node, E is set to the value 1. On returning from a visit to an internal node with b children, E is set to 1 + bE. If the branching factors going down along the random route are b0, b1, ..., bn−1, this computes, for the root node,
E = 1 + b0 + b0bi + ··· + b0b1···bn−1.
In the special case of a tree in which all nodes at the same depth i have the same number bi of children, this is the exact total number of nodes.  ​‑‑Lambiam 00:07, 3 September 2025 (UTC)[reply]
You can also compute E directly going forward, using an extra variable P. Initially, set E = 0, P = 1. When visiting a new node (the first of which is the root), set E = E + P and then P = Pb, where b is the number of children of the node. If the node being visited is terminal, we are now done (with this probe). Otherwise, move, randomly, one level down and repeat the process.  ​‑‑Lambiam 00:16, 3 September 2025 (UTC)[reply]
Thanks, I'll study this. There is certainly not only one route to a terminal node, but there might be only one path that goes to the maximum depth. But if there is more than one path that does get to the maximum depth, I'd like to get all of them. (I expect at most only a few of them. though.) (And I am using recursion.) Bubba73 You talkin' to me? 01:19, 3 September 2025 (UTC)[reply]
If there are more paths from the root to a given node, the graph is not a tree. To get an unbiased estimate of the number of nodes, you need to know more about the composition of the graph than can be obtained by these random probes. Confluence of paths implies that the method I sketched overestimates the number of notes. An example of a bad case is given by the family of graphs of the form
                  /-->--\ /-->--\ /-->--\       /-->--\
                 o       o       o       o ... o       o ,
                  \-->--/ \-->--/ \-->--/       \-->--/
for which, if there are N nodes, the expected value of E equals 2N − 1. If the constant branching factor is not specifically 2 but, more generally, b, you get E = (bN − 1) / (b − 1).  ​‑‑Lambiam 10:54, 3 September 2025 (UTC)[reply]
If the number q of incoming arcs at a node (its "indegree") is known, you can modify the algorithm by multiplying at each level by b/q instead of b (its "outdegree") – but use q = 1 for the root node instead of q = 0 . Then for a true tree, q = 1 at all nodes, so this then reverts to the tree version above. For the bad cases in my last comment above, q = b at all levels beyond the root, so P remains 1 and the final value of E is the number of nodes. So this works for two cases at extreme ends of a spectrum, one being no node sharing at all (tree) and the other being maximal node sharing at each level. I have not thoroughly analyzed this for new bad cases, but this property is encouraging. It makes it plausible that this will in general give good results for rooted DAGs in which nodes can be assigned a level such that the root is the sole node at level 0 and an arc from a node at level always goes to some node at level + 1.  ​‑‑Lambiam 20:32, 3 September 2025 (UTC)[reply]
It is a tree. I haven't implemented the idea presented here yet, but I have implemented my original idea. I did 106 random paths and gathered data on the depth and branching factor. Then I multiplied the average branching factors together and got a number > 1039 for the estimated number of nodes. I realized that this is wrong because it assumes that each branch of the tree goes to the same depth. Actually most branches terminate pretty quickly. Bubba73 You talkin' to me? 21:33, 4 September 2025 (UTC)[reply]
You could also test various algorithms on trees generated in such a way that you know their node count.  ​‑‑Lambiam 23:27, 4 September 2025 (UTC)[reply]
Thanks, I have two ideas to work on. Bubba73 You talkin' to me? 19:08, 6 September 2025 (UTC)[reply]