Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.
from time import time
t1 = time()
... # Do some work.
t2 = time()
... # Do some other work
t3 = time()
print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)
Your algorithm and the one from the discussion forum are quite different.
Yes, they both build a graph and then use BFS on it. But, as noted, the
performance problem lies in the graph building. Your algorithm is O(N*N):
nested loops over wordList. But the other solution is effectively O(N)
because the inner loop is over the positions in the current word, not the
entire word list.
# Your approach: O(N*N).
for i in range(len(wordList)):
...
for j in range(i+1, len(wordList)):
...
# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
for i in range(len(word)):
...
Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.
Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into a other sequences of characters where each character gets replaced by a generic value (underscore in the case at hand).
# The word
hotdog
# Its linking keys
_ot_og
h_td_g
ho_do_