Skip to main content
added 46 characters in body
Source Link
FMc
  • 13.1k
  • 2
  • 29
  • 40

Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.

from time import time

t1 = time()
...              # Do some work.
t2 = time()
...              # Do some other work
t3 = time()

print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)

Your algorithm and the one from the discussion forum are quite different. Yes, they both build a graph and then use BFS on it. But, as noted, the performance problem lies in the graph building. Your algorithm is O(N*N): nested loops over wordList. But the other solution is effectively O(N) because the inner loop is over the positions in the current word, not the entire word list.

# Your approach: O(N*N).
for i in range(len(wordList)):
    ...
    for j in range(i+1, len(wordList)):
        ...

# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
    for i in range(len(word)):
        ...

Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.

Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into a other sequences of characters where each character gets replaced by a generic value (underscore in the case at hand).

# The word
hotdog

# Its linking keys
_ot_og
h_td_g
ho_do_

Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.

from time import time

t1 = time()
...              # Do some work.
t2 = time()
...              # Do some other work
t3 = time()

print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)

Your algorithm and the one from the discussion forum are quite different. Yes, they both build a graph and then use BFS on it. But, as noted, the performance problem lies in the graph building. Your algorithm is O(N*N): nested loops over wordList. But the other solution is effectively O(N) because the inner loop is over the positions in the current word, not the entire word list.

# Your approach: O(N*N).
for i in range(len(wordList)):
    ...
    for j in range(i+1, len(wordList)):
        ...

# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
    for i in range(len(word)):
        ...

Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.

Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into a other sequences of characters where each character gets replaced by a generic value (underscore in the case at hand).

# The word
hot

# Its linking keys
_ot
h_t
ho_

Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.

from time import time

t1 = time()
...              # Do some work.
t2 = time()
...              # Do some other work
t3 = time()

print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)

Your algorithm and the one from the discussion forum are quite different. Yes, they both build a graph and then use BFS on it. But, as noted, the performance problem lies in the graph building. Your algorithm is O(N*N): nested loops over wordList. But the other solution is effectively O(N) because the inner loop is over the positions in the current word, not the entire word list.

# Your approach: O(N*N).
for i in range(len(wordList)):
    ...
    for j in range(i+1, len(wordList)):
        ...

# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
    for i in range(len(word)):
        ...

Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.

Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into other sequences of characters where each character gets replaced by a generic value.

# The word
dog

# Its linking keys
_og
d_g
do_
Source Link
FMc
  • 13.1k
  • 2
  • 29
  • 40

Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.

from time import time

t1 = time()
...              # Do some work.
t2 = time()
...              # Do some other work
t3 = time()

print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)

Your algorithm and the one from the discussion forum are quite different. Yes, they both build a graph and then use BFS on it. But, as noted, the performance problem lies in the graph building. Your algorithm is O(N*N): nested loops over wordList. But the other solution is effectively O(N) because the inner loop is over the positions in the current word, not the entire word list.

# Your approach: O(N*N).
for i in range(len(wordList)):
    ...
    for j in range(i+1, len(wordList)):
        ...

# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
    for i in range(len(word)):
        ...

Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.

Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into a other sequences of characters where each character gets replaced by a generic value (underscore in the case at hand).

# The word
hot

# Its linking keys
_ot
h_t
ho_