Efficient algorithm for detecting cycles in a directed graph

Question

Is there an efficient algorithm for detecting cycles within a directed graph?

I have a directed graph representing a schedule of jobs that need to be executed, a job being a node and a dependency being an edge. I need to detect the error case of a cycle within this graph leading to cyclic dependencies.

It would be better to detect all cycles so they could be fixed in one go.

nbro · Accepted Answer · 2015-07-28 22:06:00Z

220

Tarjan's strongly connected components algorithm has O(|E| + |V|) time complexity.

For other algorithms, see Strongly connected components on Wikipedia.

edited Jul 28, 2015 at 22:06

nbro

16k34 gold badges122 silver badges218 bronze badges

answered Nov 4, 2008 at 11:35

aku

124k33 gold badges177 silver badges204 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Peter Over a year ago

How does finding the strongly connected components tell you about the cycles that exist in the graph?

Cédric Guillemette Over a year ago

May be somebody can confirm but the Tarjan algorithm does not support cycles of nodes pointing directly to themselves, like A->A.

mgiuca Over a year ago

@Cedrik Right, not directly. This isn't a flaw in Tarjan's algorithm, but the way it is used for this question. Tarjan doesn't directly find cycles, it finds strongly connected components. Of course, any SCC with a size greater than 1 implies a cycle. Non-cyclic components have a singleton SCC by themselves. The problem is that a self-loop will also go into a SCC by itself. So you need a separate check for self-loops, which is pretty trivial.

optimusfrenk Over a year ago

(all strongly connected components in the graph) != (all cycles in the graph)

KGhatak Over a year ago

@ aku : A three color DFS also has same runtime O(|E| + |V|). Using white (never visited), grey (current node is visited but all reachable nodes are not yet visited) and black (all reachable nodes are visited along with the current one) color coding, if a grey node finds another grey node then we've a cycle. [Pretty much what we've in Cormen's algorithm book]. Wondering if 'Tarjan's algorithm' has any benefit over such DFS!!

|

nbro · Accepted Answer · 2015-07-28 22:09:56Z

Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.

A topological sort can detect cycles, inasmuch as it relies on a depth-first search algorithm, but you need additional bookkeeping to actually detect cycles. See Kurt Peek's correct answer.

Jthorpe · Accepted Answer · 2022-12-14 21:49:25Z

According to Lemma 22.11 of Cormen et al., Introduction to Algorithms (CLRS):

A directed graph G is acyclic if and only if a depth-first search of G yields no back edges.

This has been mentioned in several answers; here I'll also provide a code example based on chapter 22 of CLRS. The example graph is illustrated below.

CLRS' pseudo-code for depth-first search reads:

In the example in CLRS Figure 22.4, the graph consists of two DFS trees: one consisting of nodes u, v, x, and y, and the other of nodes w and z. Each tree contains one back edge: one from x to v and another from z to z (a self-loop).

The key realization is that a back edge is encountered when, in the DFS-VISIT function, while iterating over the neighbors v of u, a node is encountered with the GRAY color.

The following Python code is an adaptation of CLRS' pseudocode with an if clause added which detects cycles:

import collections


class Graph(object):
    def __init__(self, edges):
        self.edges = edges
        self.adj = Graph._build_adjacency_list(edges)

    @staticmethod
    def _build_adjacency_list(edges):
        adj = collections.defaultdict(list)
        for edge in edges:
            adj[edge[0]].append(edge[1])
            adj[edge[1]] # side effect only
        return adj


def dfs(G):
    discovered = set()
    finished = set()

    for u in G.adj:
        if u not in discovered and u not in finished:
            discovered, finished = dfs_visit(G, u, discovered, finished)


def dfs_visit(G, u, discovered, finished):
    discovered.add(u)

    for v in G.adj[u]:
        # Detect cycles
        if v in discovered:
            print(f"Cycle detected: found a back edge from {u} to {v}.")
            break

        # Recurse into DFS tree
        if v not in finished:
            dfs_visit(G, v, discovered, finished)

    discovered.remove(u)
    finished.add(u)

    return discovered, finished


if __name__ == "__main__":
    G = Graph([
        ('u', 'v'),
        ('u', 'x'),
        ('v', 'y'),
        ('w', 'y'),
        ('w', 'z'),
        ('x', 'v'),
        ('y', 'x'),
        ('z', 'z')])

    dfs(G)

Note that in this example, the time in CLRS' pseudocode is not captured because we're only interested in detecting cycles. There is also some boilerplate code for building the adjacency list representation of a graph from a list of edges.

When this script is executed, it prints the following output:

Cycle detected: found a back edge from x to v.
Cycle detected: found a back edge from z to z.

These are exactly the back edges in the example in CLRS Figure 22.4.

I get RecursionError: maximum recursion depth exceeded while calling a Python object for this code.
@zino Looks like there should be a break after the cycle is detected. I tried adding it but the edit queue is full.
nit: discovered, finished = dfs_visit(G, u, discovered, finished) can be replaced with: dfs_visit(G, u, discovered, finished) and dfs-visit can return None
@zino @A_P I edited the example to add that missing break statement after a cycle has been detected.
@ChristianLong You can add a call to adj[edge[1]] in the for loop of _build_adjacency_list which has the effect of initializing the keys for nodes that don't appear as edge[0] for at least some edge (which will generally happen in graph's without a cycle) (added to the answer just now)

nbro · Accepted Answer · 2015-07-28 22:15:09Z

40

The simplest way to do it is to do a depth first traversal (DFT) of the graph.

If the graph has n vertices, this is a O(n) time complexity algorithm. Since you will possibly have to do a DFT starting from each vertex, the total complexity becomes O(n^2).

You have to maintain a stack containing all vertices in the current depth first traversal, with its first element being the root node. If you come across an element which is already in the stack during the DFT, then you have a cycle.

edited Jul 28, 2015 at 22:15

nbro

16k34 gold badges122 silver badges218 bronze badges

answered Apr 21, 2009 at 1:14

phys wizard

7 Comments

Peter Over a year ago

This would be true for a "regular" graph, but is false for a directed graph. For example, consider the "diamond dependency diagram" with four nodes: A with edges pointing to B and C, each of which has an edge pointing to D. Your DFT traversal of this diagram from A would incorrectly conclude that the "loop" was actually a cycle - although there is a loop, it is not a cycle because it cannot be traversed by following the arrows.

Deepak Over a year ago

@peter can you please explain how DFT from A will incorrectly conclude that there is a cycle?

Peter Over a year ago

@Deepak - In fact, I misread the answer from "phys wizard": where he wrote "in the stack" I thought "has already been found". It would indeed be sufficient (for detecting a directed loop) to check for dupes "in the stack" during the execution of a DFT. One upvote for each of you.

James Wierzba Over a year ago

Why do you say the time complexity is O(n) while you suggest checking the stack to see if it already contains a visited node? Scanning the stack adds time to O(n) runtime because it has to scan the stack on each new node. You can achieve O(n) if you mark the nodes visited

Luke Hutchison Over a year ago

As Peter said, this is incomplete for directed graphs. See Kurt Peek's correct answer.

|

Armin Primadi · Accepted Answer · 2016-05-26 08:16:56Z

In my opinion, the most understandable algorithm for detecting cycle in a directed graph is the graph-coloring-algorithm.

Basically, the graph coloring algorithm walks the graph in a DFS manner (Depth First Search, which means that it explores a path completely before exploring another path). When it finds a back edge, it marks the graph as containing a loop.

For an in depth explanation of the graph coloring algorithm, please read this article: http://www.geeksforgeeks.org/detect-cycle-direct-graph-using-colors/

Also, I provide an implementation of graph coloring in JavaScript https://github.com/dexcodeinc/graph_algorithm.js/blob/master/graph_algorithm.js

nbro · Accepted Answer · 2015-07-28 22:12:37Z

29

Start with a DFS: a cycle exists if and only if a back-edge is discovered during DFS. This is proved as a result of white-path theorum.

edited Jul 28, 2015 at 22:12

nbro

16k34 gold badges122 silver badges218 bronze badges

answered Dec 30, 2010 at 20:02

Ajay Garg

3153 silver badges2 bronze badges

4 Comments

Jonathan Prieto-Cubides Over a year ago

Yes, i think the same, but this isn't enough, I post my way cs.stackexchange.com/questions/7216/find-the-simple-cycles-in-a-directed-graph

Manohar Reddy Poreddy Over a year ago

True. Ajay Garg is only telling about how to find "a cycle", which is a part answer for this question. Your link talks about finding all cycles as per the question asked, but again it looks like it uses same approach as Ajay Garg, but also does all possible dfs-trees.

Luke Hutchison Over a year ago

This is incomplete for directed graphs. See Kurt Peek's correct answer.

sia Over a year ago

It doesn't answer a question, a question asks for a solution to find all the cycles

Aaron Digulla · Accepted Answer · 2008-11-04 12:15:46Z

If you can't add a "visited" property to the nodes, use a set (or map) and just add all visited nodes to the set unless they are already in the set. Use a unique key or the address of the objects as the "key".

This also gives you the information about the "root" node of the cyclic dependency which will come in handy when a user has to fix the problem.

Another solution is to try to find the next dependency to execute. For this, you must have some stack where you can remember where you are now and what you need to do next. Check if a dependency is already on this stack before you execute it. If it is, you've found a cycle.

While this might seem to have a complexity of O(N*M) you must remember that the stack has a very limited depth (so N is small) and that M becomes smaller with each dependency that you can check off as "executed" plus you can stop the search when you found a leaf (so you never have to check every node -> M will be small, too).

In MetaMake, I created the graph as a list of lists and then deleted every node as I executed them which naturally cut down the search volume. I never actually had to run an independent check, it all happened automatically during normal execution.

If you need a "test only" mode, just add a "dry-run" flag which disables the execution of the actual jobs.

Yuwen · Accepted Answer · 2013-04-13 03:30:12Z

8

There is no algorithm which can find all the cycles in a directed graph in polynomial time. Suppose, the directed graph has n nodes and every pair of the nodes has connections to each other which means you have a complete graph. So any non-empty subset of these n nodes indicates a cycle and there are 2^n-1 number of such subsets. So no polynomial time algorithm exists. So suppose you have an efficient (non-stupid) algorithm which can tell you the number of directed cycles in a graph, you can first find the strong connected components, then applying your algorithm on these connected components. Since cycles only exist within the components and not between them.

answered Apr 13, 2013 at 3:30

Yuwen

97311 silver badges8 bronze badges

1 Comment

user152468 Over a year ago

True, if the number of nodes is taken as the size of the input. You could also describe the runtime complexity in terms of the number of edges or even cycles, or a combination of these measures. The algorithm "Finding all the elementary circuits of a directed graph" by Donald B. Johnson has polynomial running time given by O((n + e)(c + 1)) where n is the number of nodes, e the number of edges and c the number of elementary circuits of the graph. And here is my Java implementation of this algorithm: github.com/1123/johnson.

Rpant · Accepted Answer · 2013-01-24 01:49:29Z

4

I had implemented this problem in sml ( imperative programming) . Here is the outline . Find all the nodes that either have an indegree or outdegree of 0 . Such nodes cannot be part of a cycle ( so remove them ) . Next remove all the incoming or outgoing edges from such nodes. Recursively apply this process to the resulting graph. If at the end you are not left with any node or edge , the graph does not have any cycles , else it has.

answered Jan 24, 2013 at 1:49

Rpant

1,0542 gold badges17 silver badges37 bronze badges

Comments

Community · Accepted Answer · 2017-04-13 12:57:54Z

https://mathoverflow.net/questions/16393/finding-a-cycle-of-fixed-length I like this solution the best specially for 4 length:)

Also phys wizard says u have to do O(V^2). I believe that we need only O(V)/O(V+E). If the graph is connected then DFS will visit all nodes. If the graph has connected sub graphs then each time we run a DFS on a vertex of this sub graph we will find the connected vertices and wont have to consider these for the next run of the DFS. Therefore the possibility of running for each vertex is incorrect.

Steve · Accepted Answer · 2008-11-04 13:35:57Z

1

The way I do it is to do a Topological Sort, counting the number of vertices visited. If that number is less than the total number of vertices in the DAG, you have a cycle.

answered Nov 4, 2008 at 13:35

Steve

6,5105 gold badges44 silver badges66 bronze badges

5 Comments

sleske Over a year ago

That does not make sense. If the graph has cycles, there is no topological sorting, which means any correct algorithm for topological sorting will abort.

Oleg Mikheev Over a year ago

from wikipedia: Many topological sorting algorithms will detect cycles too, since those are obstacles for topological order to exist.

nbro Over a year ago

@OlegMikheev Yes, but Steve is saying " If that number is less than the total number of vertices in the DAG, you have a cycle", which does not make sense.

maaartinus Over a year ago

@nbro I'd bet, they mean a variant of topological sorting algorithm which aborts when no topological sorting exists (and then they don't visit all vertices).

Plagon Over a year ago

If you do a topological sorting on a graph with cycle you will end up with an order that has the least amount of bad edges(order number > order number of neighbour). But after you have to the sorting its easy to detect those bad edges resulting in detecting a graph with a cycle

Bhagwati Malav · Accepted Answer · 2017-04-12 05:48:45Z

0

As you said, you have set of jobs, it need to be executed in certain order. Topological sort given you required order for scheduling of jobs(or for dependency problems if it is a direct acyclic graph). Run dfs and maintain a list, and start adding node in the beginning of the list, and if you encountered a node which is already visited. Then you found a cycle in given graph.

answered Apr 12, 2017 at 5:48

Bhagwati Malav

3,5592 gold badges22 silver badges35 bronze badges

Comments

nbro · Accepted Answer · 2015-07-28 22:16:31Z

-1

If DFS finds an edge that points to an already-visited vertex, you have a cycle there.

edited Jul 28, 2015 at 22:16

nbro

16k34 gold badges122 silver badges218 bronze badges

answered May 12, 2013 at 7:16

mafonya

2,19023 silver badges21 bronze badges

7 Comments

noisy cat Over a year ago

Fails on 1,2,3: 1,2; 1,3; 2,3;

noisy cat Over a year ago

@JakeGreene Look here: i.imgur.com/tEkM5xy.png Simple enough to understand. Lets say you start from 0. Then you go to the node 1, no more paths from there, reucrsion goes back. Now you visit node 2, which has a edge to the vertex 1, which was visited already. In your opinion you would have a cycle then - and you do not have one really

Jake Greene Over a year ago

@kittyPL That graph does not contain a cycle. From Wikipedia: "A directed cycle in a directed graph is a sequence of vertices starting and ending at the same vertex such that, for each two consecutive vertices of the cycle, there exists an edge directed from the earlier vertex to the later one" You have to be able to follow a path from V that leads back to V for a directed cycle. mafonya's solution works for the given problem

noisy cat Over a year ago

@JakeGreene Of course it does not. Using your algorithm and starting from 1 you would detect a cycle anyway... This algorithm is just bad... Usually it would be sufficient to walk backwards whenever you encounter a visited vertex.

Kyrra Over a year ago

@kittyPL DFS does work to detect cycles from the given starting node. But when doing DFS you must color visited nodes to distinguish a cross-edge from back-edge. First time visiting a vertex it turns grey, then you turn it black once all its edges have been visited. If when doing the DFS you hit a grey vertex then that vertex is an ancestor (ie: you have a cycle). If the vertex is black then it's just a cross-edge.

|

nbro · Accepted Answer · 2015-07-28 22:20:12Z

-13

If a graph satisfy this property

|e| > |v| - 1

then the graph contains at least on cycle.

edited Jul 28, 2015 at 22:20

nbro

16k34 gold badges122 silver badges218 bronze badges

answered Oct 28, 2010 at 10:33

dharmendra singh

1

3 Comments

Dr. Hans-Peter Störr Over a year ago

That's might be true for undirected graphs, but certainly not for directed graphs.

user152468 Over a year ago

A counter example would be A->B, B->C, A->C.

Debanjan Dhar Over a year ago

Not all vertices have edges.

Collectives™ on Stack Overflow

Efficient algorithm for detecting cycles in a directed graph

14 Answers 14

8 Comments

1 Comment

6 Comments

7 Comments

Comments

4 Comments

Comments

1 Comment

Comments

Comments

5 Comments

Comments

7 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

8 Comments

1 Comment

6 Comments

7 Comments

Comments

4 Comments

Comments

1 Comment

Comments

Comments

5 Comments

Comments

7 Comments

3 Comments

Linked

Related