465

Is there an efficient algorithm for detecting cycles within a directed graph?

I have a directed graph representing a schedule of jobs that need to be executed, a job being a node and a dependency being an edge. I need to detect the error case of a cycle within this graph leading to cyclic dependencies.

It would be better to detect all cycles so they could be fixed in one go.

0

14 Answers 14

220

Tarjan's strongly connected components algorithm has O(|E| + |V|) time complexity.

For other algorithms, see Strongly connected components on Wikipedia.

Sign up to request clarification or add additional context in comments.

8 Comments

How does finding the strongly connected components tell you about the cycles that exist in the graph?
May be somebody can confirm but the Tarjan algorithm does not support cycles of nodes pointing directly to themselves, like A->A.
@Cedrik Right, not directly. This isn't a flaw in Tarjan's algorithm, but the way it is used for this question. Tarjan doesn't directly find cycles, it finds strongly connected components. Of course, any SCC with a size greater than 1 implies a cycle. Non-cyclic components have a singleton SCC by themselves. The problem is that a self-loop will also go into a SCC by itself. So you need a separate check for self-loops, which is pretty trivial.
(all strongly connected components in the graph) != (all cycles in the graph)
@ aku : A three color DFS also has same runtime O(|E| + |V|). Using white (never visited), grey (current node is visited but all reachable nodes are not yet visited) and black (all reachable nodes are visited along with the current one) color coding, if a grey node finds another grey node then we've a cycle. [Pretty much what we've in Cormen's algorithm book]. Wondering if 'Tarjan's algorithm' has any benefit over such DFS!!
|
88

Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.

1 Comment

A topological sort can detect cycles, inasmuch as it relies on a depth-first search algorithm, but you need additional bookkeeping to actually detect cycles. See Kurt Peek's correct answer.
87

According to Lemma 22.11 of Cormen et al., Introduction to Algorithms (CLRS):

A directed graph G is acyclic if and only if a depth-first search of G yields no back edges.

This has been mentioned in several answers; here I'll also provide a code example based on chapter 22 of CLRS. The example graph is illustrated below.

enter image description here

CLRS' pseudo-code for depth-first search reads:

enter image description here

In the example in CLRS Figure 22.4, the graph consists of two DFS trees: one consisting of nodes u, v, x, and y, and the other of nodes w and z. Each tree contains one back edge: one from x to v and another from z to z (a self-loop).

The key realization is that a back edge is encountered when, in the DFS-VISIT function, while iterating over the neighbors v of u, a node is encountered with the GRAY color.

The following Python code is an adaptation of CLRS' pseudocode with an if clause added which detects cycles:

import collections


class Graph(object):
    def __init__(self, edges):
        self.edges = edges
        self.adj = Graph._build_adjacency_list(edges)

    @staticmethod
    def _build_adjacency_list(edges):
        adj = collections.defaultdict(list)
        for edge in edges:
            adj[edge[0]].append(edge[1])
            adj[edge[1]] # side effect only
        return adj


def dfs(G):
    discovered = set()
    finished = set()

    for u in G.adj:
        if u not in discovered and u not in finished:
            discovered, finished = dfs_visit(G, u, discovered, finished)


def dfs_visit(G, u, discovered, finished):
    discovered.add(u)

    for v in G.adj[u]:
        # Detect cycles
        if v in discovered:
            print(f"Cycle detected: found a back edge from {u} to {v}.")
            break

        # Recurse into DFS tree
        if v not in finished:
            dfs_visit(G, v, discovered, finished)

    discovered.remove(u)
    finished.add(u)

    return discovered, finished


if __name__ == "__main__":
    G = Graph([
        ('u', 'v'),
        ('u', 'x'),
        ('v', 'y'),
        ('w', 'y'),
        ('w', 'z'),
        ('x', 'v'),
        ('y', 'x'),
        ('z', 'z')])

    dfs(G)

Note that in this example, the time in CLRS' pseudocode is not captured because we're only interested in detecting cycles. There is also some boilerplate code for building the adjacency list representation of a graph from a list of edges.

When this script is executed, it prints the following output:

Cycle detected: found a back edge from x to v.
Cycle detected: found a back edge from z to z.

These are exactly the back edges in the example in CLRS Figure 22.4.

6 Comments

I get RecursionError: maximum recursion depth exceeded while calling a Python object for this code.
@zino Looks like there should be a break after the cycle is detected. I tried adding it but the edit queue is full.
nit: discovered, finished = dfs_visit(G, u, discovered, finished) can be replaced with: dfs_visit(G, u, discovered, finished) and dfs-visit can return None
@zino @A_P I edited the example to add that missing break statement after a cycle has been detected.
@ChristianLong You can add a call to adj[edge[1]] in the for loop of _build_adjacency_list which has the effect of initializing the keys for nodes that don't appear as edge[0] for at least some edge (which will generally happen in graph's without a cycle) (added to the answer just now)
|
40

The simplest way to do it is to do a depth first traversal (DFT) of the graph.

If the graph has n vertices, this is a O(n) time complexity algorithm. Since you will possibly have to do a DFT starting from each vertex, the total complexity becomes O(n^2).

You have to maintain a stack containing all vertices in the current depth first traversal, with its first element being the root node. If you come across an element which is already in the stack during the DFT, then you have a cycle.

7 Comments

This would be true for a "regular" graph, but is false for a directed graph. For example, consider the "diamond dependency diagram" with four nodes: A with edges pointing to B and C, each of which has an edge pointing to D. Your DFT traversal of this diagram from A would incorrectly conclude that the "loop" was actually a cycle - although there is a loop, it is not a cycle because it cannot be traversed by following the arrows.
@peter can you please explain how DFT from A will incorrectly conclude that there is a cycle?
@Deepak - In fact, I misread the answer from "phys wizard": where he wrote "in the stack" I thought "has already been found". It would indeed be sufficient (for detecting a directed loop) to check for dupes "in the stack" during the execution of a DFT. One upvote for each of you.
Why do you say the time complexity is O(n) while you suggest checking the stack to see if it already contains a visited node? Scanning the stack adds time to O(n) runtime because it has to scan the stack on each new node. You can achieve O(n) if you mark the nodes visited
As Peter said, this is incomplete for directed graphs. See Kurt Peek's correct answer.
|
35

In my opinion, the most understandable algorithm for detecting cycle in a directed graph is the graph-coloring-algorithm.

Basically, the graph coloring algorithm walks the graph in a DFS manner (Depth First Search, which means that it explores a path completely before exploring another path). When it finds a back edge, it marks the graph as containing a loop.

For an in depth explanation of the graph coloring algorithm, please read this article: http://www.geeksforgeeks.org/detect-cycle-direct-graph-using-colors/

Also, I provide an implementation of graph coloring in JavaScript https://github.com/dexcodeinc/graph_algorithm.js/blob/master/graph_algorithm.js

Comments

29

Start with a DFS: a cycle exists if and only if a back-edge is discovered during DFS. This is proved as a result of white-path theorum.

4 Comments

Yes, i think the same, but this isn't enough, I post my way cs.stackexchange.com/questions/7216/find-the-simple-cycles-in-a-directed-graph
True. Ajay Garg is only telling about how to find "a cycle", which is a part answer for this question. Your link talks about finding all cycles as per the question asked, but again it looks like it uses same approach as Ajay Garg, but also does all possible dfs-trees.
This is incomplete for directed graphs. See Kurt Peek's correct answer.
It doesn't answer a question, a question asks for a solution to find all the cycles
9

If you can't add a "visited" property to the nodes, use a set (or map) and just add all visited nodes to the set unless they are already in the set. Use a unique key or the address of the objects as the "key".

This also gives you the information about the "root" node of the cyclic dependency which will come in handy when a user has to fix the problem.

Another solution is to try to find the next dependency to execute. For this, you must have some stack where you can remember where you are now and what you need to do next. Check if a dependency is already on this stack before you execute it. If it is, you've found a cycle.

While this might seem to have a complexity of O(N*M) you must remember that the stack has a very limited depth (so N is small) and that M becomes smaller with each dependency that you can check off as "executed" plus you can stop the search when you found a leaf (so you never have to check every node -> M will be small, too).

In MetaMake, I created the graph as a list of lists and then deleted every node as I executed them which naturally cut down the search volume. I never actually had to run an independent check, it all happened automatically during normal execution.

If you need a "test only" mode, just add a "dry-run" flag which disables the execution of the actual jobs.

Comments

8

There is no algorithm which can find all the cycles in a directed graph in polynomial time. Suppose, the directed graph has n nodes and every pair of the nodes has connections to each other which means you have a complete graph. So any non-empty subset of these n nodes indicates a cycle and there are 2^n-1 number of such subsets. So no polynomial time algorithm exists. So suppose you have an efficient (non-stupid) algorithm which can tell you the number of directed cycles in a graph, you can first find the strong connected components, then applying your algorithm on these connected components. Since cycles only exist within the components and not between them.

1 Comment

True, if the number of nodes is taken as the size of the input. You could also describe the runtime complexity in terms of the number of edges or even cycles, or a combination of these measures. The algorithm "Finding all the elementary circuits of a directed graph" by Donald B. Johnson has polynomial running time given by O((n + e)(c + 1)) where n is the number of nodes, e the number of edges and c the number of elementary circuits of the graph. And here is my Java implementation of this algorithm: github.com/1123/johnson.
4

I had implemented this problem in sml ( imperative programming) . Here is the outline . Find all the nodes that either have an indegree or outdegree of 0 . Such nodes cannot be part of a cycle ( so remove them ) . Next remove all the incoming or outgoing edges from such nodes. Recursively apply this process to the resulting graph. If at the end you are not left with any node or edge , the graph does not have any cycles , else it has.

Comments

2

https://mathoverflow.net/questions/16393/finding-a-cycle-of-fixed-length I like this solution the best specially for 4 length:)

Also phys wizard says u have to do O(V^2). I believe that we need only O(V)/O(V+E). If the graph is connected then DFS will visit all nodes. If the graph has connected sub graphs then each time we run a DFS on a vertex of this sub graph we will find the connected vertices and wont have to consider these for the next run of the DFS. Therefore the possibility of running for each vertex is incorrect.

Comments

1

The way I do it is to do a Topological Sort, counting the number of vertices visited. If that number is less than the total number of vertices in the DAG, you have a cycle.

5 Comments

That does not make sense. If the graph has cycles, there is no topological sorting, which means any correct algorithm for topological sorting will abort.
from wikipedia: Many topological sorting algorithms will detect cycles too, since those are obstacles for topological order to exist.
@OlegMikheev Yes, but Steve is saying " If that number is less than the total number of vertices in the DAG, you have a cycle", which does not make sense.
@nbro I'd bet, they mean a variant of topological sorting algorithm which aborts when no topological sorting exists (and then they don't visit all vertices).
If you do a topological sorting on a graph with cycle you will end up with an order that has the least amount of bad edges(order number > order number of neighbour). But after you have to the sorting its easy to detect those bad edges resulting in detecting a graph with a cycle
0

As you said, you have set of jobs, it need to be executed in certain order. Topological sort given you required order for scheduling of jobs(or for dependency problems if it is a direct acyclic graph). Run dfs and maintain a list, and start adding node in the beginning of the list, and if you encountered a node which is already visited. Then you found a cycle in given graph.

Comments

-1

If DFS finds an edge that points to an already-visited vertex, you have a cycle there.

7 Comments

Fails on 1,2,3: 1,2; 1,3; 2,3;
@JakeGreene Look here: i.imgur.com/tEkM5xy.png Simple enough to understand. Lets say you start from 0. Then you go to the node 1, no more paths from there, reucrsion goes back. Now you visit node 2, which has a edge to the vertex 1, which was visited already. In your opinion you would have a cycle then - and you do not have one really
@kittyPL That graph does not contain a cycle. From Wikipedia: "A directed cycle in a directed graph is a sequence of vertices starting and ending at the same vertex such that, for each two consecutive vertices of the cycle, there exists an edge directed from the earlier vertex to the later one" You have to be able to follow a path from V that leads back to V for a directed cycle. mafonya's solution works for the given problem
@JakeGreene Of course it does not. Using your algorithm and starting from 1 you would detect a cycle anyway... This algorithm is just bad... Usually it would be sufficient to walk backwards whenever you encounter a visited vertex.
@kittyPL DFS does work to detect cycles from the given starting node. But when doing DFS you must color visited nodes to distinguish a cross-edge from back-edge. First time visiting a vertex it turns grey, then you turn it black once all its edges have been visited. If when doing the DFS you hit a grey vertex then that vertex is an ancestor (ie: you have a cycle). If the vertex is black then it's just a cross-edge.
|
-13

If a graph satisfy this property

|e| > |v| - 1

then the graph contains at least on cycle.

3 Comments

That's might be true for undirected graphs, but certainly not for directed graphs.
A counter example would be A->B, B->C, A->C.
Not all vertices have edges.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.