I found source code for nltk.draw.dispersion and it seems there is mistake.
def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Plot"):
"""
Generate a lexical dispersion plot.
:param text: The source text
:type text: list(str) or iter(str)
:param words: The target words
:type words: list of str
:param ignore_case: flag to set if case should be ignored when searching text
:type ignore_case: bool
:return: a matplotlib Axes object that may still be modified before plotting
:rtype: Axes
"""
try:
import matplotlib.pyplot as plt
except ImportError as e:
raise ImportError(
"The plot function requires matplotlib to be installed. "
"See https://matplotlib.org/"
) from e
word2y = {
word.casefold() if ignore_case else word: y
for y, word in enumerate(reversed(words)) # <--- HERE
}
xs, ys = [], []
for x, token in enumerate(text):
token = token.casefold() if ignore_case else token
y = word2y.get(token)
if y is not None:
xs.append(x)
ys.append(y)
_, ax = plt.subplots()
ax.plot(xs, ys, "|")
ax.set_yticks(list(range(len(words))), words, color="C0") # <--- HERE
ax.set_ylim(-1, len(words))
ax.set_title(title)
ax.set_xlabel("Word Offset")
return ax
if __name__ == "__main__":
import matplotlib.pyplot as plt
from nltk.corpus import gutenberg
words = ["Elinor", "Marianne", "Edward", "Willoughby"]
dispersion_plot(gutenberg.words("austen-sense.txt"), words)
plt.show()
It calculates word2y
using reversed(words)
for y, word in enumerate(reversed(words))
but later it uses ax.set_yticks()
using words
but it should use reversed(words)
ax.set_yticks(list(range(len(words))), words, color="C0")
(or it should calculate word2y
without using reversed()
).
I added # <--- HERE
in code above to show these places.
It may need to report it as a issue.
At this moment you can get ax
and use set_yticks
with reversed
to correct it.
In your code it will be targets
instead of words
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
Full working code
import matplotlib.pyplot as plt
from nltk.draw.dispersion import dispersion_plot
words = ['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets = ['aa','bbb', 'f', 'cccc']
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
plt.show()

EDIT: I seems this problem was reported few months ago and they add reversed()
in code on GitHub - and probably it will work in next version
dispersion plot not working properly · Issue #3133 · nltk/nltk
dispersion plot not working properly by Apros7 · Pull Request #3134 · nltk/nltk
targets = reversed(targets)
ordispersion_plot(words, reversed(targets))
Y
in correct order, and second later it gives meY
in wrong order, and next time it showY
again in correct order. nltk 3.7, matplotlib 3.6.3, Python 3.10, python shell bpython 0.24, system Linux Mint 21.revesed(words)
in source code.