3
\$\begingroup\$

Given set of words that are lexicographically sorted, return lexicographic order.

E.g:

abc
acd
bcc
bed
bdc
dab

The order of letters for the given example would be

a->b->c->e->d

Time:

Part-1:

Complexity of constructing graph: \$O(n * m)\$, where \$n\$ is number of words and \$m\$ is max length of any word.

Part-2:

Topological sort: \$O(V + E)\$, where \$V\$ is number of vertices and \$E\$ is number of edges

Space:

\$O(V + E)\$ - entire graph is stored

Looking for request code review, optimizations and best practices. Also verifying if how would final answer for complexity look like.

E.g: Would it be \$O(n*m + E + V)\$?

class GraphLexico<T> implements Iterable<T> {

    /* A map from nodes in the graph to sets of outgoing edges.  Each
     * set of edges is represented by a map from edges to doubles.
     */
    private final Map<T, List<T>> graph = new HashMap<T, List<T>>();

    /**
     *  Adds a new node to the graph. If the node already exists then its a
     *  no-op.
     * 
     * @param node  Adds to a graph. If node is null then this is a no-op.
     * @return      true if node is added, false otherwise.
     */
    public boolean addNode(T node) {
        if (node == null) {
            throw new NullPointerException("The input node cannot be null.");
        }
        if (graph.containsKey(node)) return false;

        graph.put(node, new ArrayList<T>());
        return true;
    }

    /**
     * Given the source and destination node it would add an arc from source 
     * to destination node. If an arc already exists then the value would be 
     * updated the new value.
     *  
     * @param source                    the source node.
     * @param destination               the destination node.
     * @param length                    if length if 
     * @throws NullPointerException     if source or destination is null.
     * @throws NoSuchElementException   if either source of destination does not exists. 
     */
    public void addEdge (T source, T destination) {
        if (source == null || destination == null) {
            throw new NullPointerException("Source and Destination, both should be non-null.");
        }
        if (!graph.containsKey(source) || !graph.containsKey(destination)) {
            throw new NoSuchElementException("Source and Destination, both should be part of graph");
        }
        /* A node would always be added so no point returning true or false */
        graph.get(source).add(destination);
    }

    /**
     * Given a node, returns the edges going outward that node,
     * as an immutable map.
     * 
     * @param node The node whose edges should be queried.
     * @return An immutable view of the edges leaving that node.
     * @throws NullPointerException   If input node is null.
     * @throws NoSuchElementException If node is not in graph.
     */
    public List<T> edgesFrom(T node) {
        if (node == null) {
            throw new NullPointerException("The node should not be null.");
        }
        List<T> edges = graph.get(node);
        if (edges == null) {
            throw new NoSuchElementException("Source node does not exist.");
        }
        return Collections.unmodifiableList(graph.get(node));
    }

    /**
     * Returns the iterator that travels the nodes of a graph.
     * 
     * @return an iterator that travels the nodes of a graph.
     */
    @Override public Iterator<T> iterator() {
        return graph.keySet().iterator();
    }
}

public final class LexicographicalSort {

    private LexicographicalSort() {}

    /**
     * Returns the list of characters in lexicographically sorted order.
     * 
     * Note that if entire information needed to determine lexicographical 
     * order is not present then results are unreliable.
     * 
     * @param dictionary the list of words ordered in lexicographical order
     */
    public static List<Character> lexigoGraphicOrder(List<String> dictionary) {
        final GraphLexico<Character> graph = new GraphLexico<Character>();
        for (int i = 0; i < dictionary.size() - 1; i++) {
            createGraph(dictionary.get(i),  dictionary.get(i + 1), graph);
        }
        return topologicalSort(graph);
    }

    /**
     * Creates a DAG based on the lexicographical order.
     * 
     * 
     * @param string1   the first string with higher placement/priority in dictionary
     * @param string2   the second string with lesser placement/priority in dictionary
     * @param graph     the DAG to be constructed.
     */
    private static void createGraph(String string1, String string2, GraphLexico<Character> graph) {
        char[] ch1 = string1.toCharArray();
        char[] ch2 = string2.toCharArray();

        // pick the smaller length
        int minLength = ch1.length > ch2.length ? ch2.length : ch1.length;

        for (int i = 0; i < minLength; i++) {
            if (ch1[i] != ch2[i]) {
                graph.addNode(ch1[i]);
                graph.addNode(ch2[i]);
                graph.addEdge(ch1[i], ch2[i]);
                return;
            }
        }
    }


    /**
     * Running the topological sort, on the constructed graph
     * 
     * 
     * @param graph     the DAG determining priority of characters
     * @return          the characters in lexicographic order
     */
    private static  List<Character> topologicalSort(GraphLexico<Character> graph) {
        final GraphLexico<Character> reverseGraph = reverseGraph(graph);

        final List<Character> result = new ArrayList<Character>();
        final Set<Character> visited = new HashSet<Character>();
        final Set<Character> finished = new HashSet<Character>();

        for (Character node : reverseGraph) {
            explore(node, result, visited, finished, reverseGraph);
        }

        return result;
    }


    private static void explore (Character node, List<Character> result, Set<Character> visited, 
                                    Set<Character> finished, GraphLexico<Character> reverseGraph) {

        if (visited.contains(node)) {
            if (finished.contains(node)) return;
            else throw new IllegalArgumentException("Cycle detected. ");
        }

        visited.add(node);

        for(Character currNode : reverseGraph.edgesFrom(node)) {
            explore(currNode, result, visited, finished, reverseGraph);
        }

        finished.add(node);
        result.add(node);
    }

    private static GraphLexico<Character> reverseGraph(GraphLexico<Character> graph) {
        final GraphLexico<Character> graphRev = new GraphLexico<Character>();

        for (Character node : graph) {
            graphRev.addNode(node);
        }

        for (Character node : graph) {
            for (Character neighbors : graph.edgesFrom(node)) {
                graphRev.addEdge(neighbors, node);
            }
        }

        return graphRev;
    }
}

Followed by testing

public class LexicographicalSortTest {

    @Test
    public void testLexicoGraphicalSort() {
        List<String> list = new ArrayList<String>();
        list.add("abc");
        list.add("acd");
        list.add("bcc");
        list.add("bed");
        list.add("bdc");
        list.add("dab");

        List<Character> expectedList = new ArrayList<Character>();
        expectedList.add('a');
        expectedList.add('b');
        expectedList.add('c');
        expectedList.add('e');
        expectedList.add('d');

        assertEquals(expectedList, LexicographicalSort.lexigoGraphicOrder(list));
    }

}
\$\endgroup\$

2 Answers 2

6
\$\begingroup\$

You code has lots of strengths. Its main weakness is the lack of explanation for the algorithm.

  1. Why are you building the reverse graph? A reverse post order visiting of the graph is a topo sort order. Just get the post order and then reverse it! Saves lots of effort and space.

  2. I can't grok how your topo sort is finding DAG roots. You can't afford to start searching at nodes that don't have zero parents. Keeping a parent count for each character makes it super-easy to find the roots.

  3. Your cycle detection is too complex. A cycle occurs if and only if a recursive call to explore encounters a node that's already on the stack. Just keep an "active" set. Add a node before a recursive call to explore and remove it afterward. If you encounter a node that's active, you've found a cycle. The active set will generally be much smaller than the node set, so this is a tiny performance win as well.

Additional things to think about:

  1. You can use a LinkedHashSet to maintain the post order of visits and keep track of already-visited nodes efficiently with one data structure instead of two.

  2. Using sets instead of lists of successors in the graph removes a run time dependency on graph degree. This is a small thing.

  3. Learn how to use JavaDoc format comments. It's worth the time in the long run.

  4. A few of your names are deceptive. A dictionary in code usually refers to a hash of strings. Etc., etc.

  5. Not sure why you broke strings into character arrays.

  6. It's simpler to have the graph edge adder create missing nodes than to deem missing nodes an error condition.

I liked this problem so well I coded my own version to check some of the ideas above:

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

public class LexiDetective {

    /**
     * A little unit test of the lexicographic detective.
     */
    public static void main(String[] args) {
        String [] words = { "abc", "acd", "bcc", "bed", "bdc", "dab" };
        CharacterGraph graph = new CharacterGraph();
        graph.insertWordList(words);
        try {
            List<Character> order = graph.topoSort();
            for (Character ch : order) {
                System.out.print(ch);
            }
            System.out.println();
        } catch (Exception ex) {
            // Topo sort might find cycle and raise this exception.
            System.err.println(ex.getMessage());
        }
    }
}

/**
 * Symbol graph specialized for characters and enhanced to deal
 * with lexicographically ordered word lists.
 */
class CharacterGraph extends SymbolGraph<Character> {

    /**
     * Insert a lexicographically ordered word pair into the
     * character graph using the first non-equal character pair to
     * add an edge.
     * 
     * @param x first word in lexicographic order
     * @param y second word
     */
    void insertWordPair(String x, String y) {
        int len = Math.min(x.length(), y.length());
        for (int i = 0; i < len; i++) {
            char cx = x.charAt(i);
            char cy = y.charAt(i);
            if (cx != cy) {
                addEdge(cx, cy);
                break;
            }
        }
    }

    /**
     * Insert a lexicographically ordered word list into the character
     * graph by inserting each adjacent word pair.
     * 
     * @param list lexicographically ordered word list
     */
    void insertWordList(String [] list) {
        for (int i = 0; i < list.length - 1; i++) {
            insertWordPair(list[i], list[i + 1]);
        }
    }
}

/**
 * A class for graphs with symbols as nodes.  Includes topological sort
 * in symbol order.
 * 
 * @param <Symbol> ordered symbol type
 */
class SymbolGraph<Symbol> {

    /**
     * Information about a symbol and its successors in the graph.
     *
     * @param <T> symbol type
     */
    static class NodeData<T> {

        /**
         * Count of symbols with this one as successor.
         */
        int parentCount = 0;
        /**
         * Set of successor symbols.
         */
        final Set<T> successors = new HashSet<>();
    }

    /**
     * Graph adjacencies stored as a map from symbols to node data. The node
     * data stores information about the symbol and its successors.
     */
    Map<Symbol, NodeData> adjacencies = new HashMap<>();

    /**
     * Add a node to the graph unless it's already there.
     *
     * @param a datum for node
     * @return node, either existing or newly created
     */
    NodeData<Symbol> addNode(Symbol a) {
        NodeData<Symbol> data = adjacencies.get(a);
        if (data == null) {
            data = new NodeData<>();
            adjacencies.put(a, data);
        }
        return data;
    }

    /**
     * Add an edge to the graph unless it's already there.
     *
     * @param a edge origin
     * @param b edge destination
     */
    void addEdge(Symbol a, Symbol b) {
        NodeData<Symbol> aData = addNode(a);
        if (!aData.successors.contains(b)) {
            aData.successors.add(b);
            NodeData<Symbol> bData = addNode(b);
            ++bData.parentCount;
        }
    }

    /**
     * Visit the graph rooted at given symbol in post order, accumulating an
     * ordered set of visited symbols.
     *
     * @param a the symbol for the (sub)graph to search
     * @param visited an ordered set that gives the post visit order
     */
    void postOrderVisit(Symbol a, Set<Symbol> ancestors, LinkedHashSet<Symbol> visited) 
            throws Exception {
        if (ancestors.contains(a)) {
            throw new Exception("Cycle detected. No post order exists.");
        }
        if (!visited.contains(a)) {
            NodeData<Symbol> data = adjacencies.get(a);
            if (data != null) {
                ancestors.add(a);
                for (Symbol aSuccessor : data.successors) {
                    postOrderVisit(aSuccessor, ancestors, visited);
                }
                visited.add(a);
                ancestors.remove(a);
            }
        }
    }

    /**
     * Topologically sort the symbol graph and return the result.
     *
     * @return topological sort of symbols
     */
    List<Symbol> topoSort() throws Exception {
        Set<Symbol> ancestors = new HashSet<>();
        LinkedHashSet<Symbol> visited = new LinkedHashSet<>();
        // Loop through all the symbols and their data.
        for (Entry<Symbol, NodeData> pair : adjacencies.entrySet()) {
            // Search each root (symbol with no parents).
            if (pair.getValue().parentCount == 0) {
                postOrderVisit(pair.getKey(), ancestors, visited);
            }
        }
        // Reverse the post order visit to get a topo sort.
        List<Symbol> order = new ArrayList<>(visited);
        Collections.reverse(order);
        return order;
    }
}
\$\endgroup\$
4
\$\begingroup\$

I would make a small change to the test case, to make it more concise:

public class LexicographicalSortTest {

    @Test
    public void testLexicoGraphicalSort() {
        List<String> list = Arrays.asList("abc", "acd", "bcc", "bed", "bdc", "dab");

        List<Character> expectedList = Arrays.asList("abced".toCharArray());

        assertEquals(expectedList, LexicographicalSort.lexigoGraphicOrder(list));
    }

}
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.