0

Since last month NLTK dispersion_plot seems to have y (vertical) axis in reversed order on my machine. This is likely something about my versions of software (I am on a school virtual machine).

versions: nltk 3.8.1 matplotlib 3.7.2 Python 3.9.13

code:

from nltk.draw.dispersion import dispersion_plot
words=['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets=['aa','bbb', 'f', 'cccc']
dispersion_plot(words, targets)

enter image description here

expected: aaa is present at the beginning, and cccc at the end. actual: it's backwards! also notice f should be completely absent - instead bbb is absent.

conclusion: Y axis is backwards.

6
  • if you have reversed order then maybe use targets = reversed(targets) or dispersion_plot(words, reversed(targets)) Commented Oct 10, 2023 at 0:33
  • @furas that's not what I'm asking. Have a look at the example. Data is in the correct order - it's a bug in the graph library that shows bar in correct order, but LABELS in backwards order. Making graph misleading. Commented Oct 10, 2023 at 8:01
  • 1
    maybe you report it to NLTK's authors as error Commented Oct 10, 2023 at 10:08
  • BTW: I run code and sometimes it gives me Y in correct order, and second later it gives me Y in wrong order, and next time it show Y again in correct order. nltk 3.7, matplotlib 3.6.3, Python 3.10, python shell bpython 0.24, system Linux Mint 21. Commented Oct 10, 2023 at 10:11
  • BTW: there is source code for dispersion_plot. Maybe you can find what can make problem. There is revesed(words) in source code. Commented Oct 10, 2023 at 10:18

2 Answers 2

2

I found source code for nltk.draw.dispersion and it seems there is mistake.

def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Plot"):
    """
    Generate a lexical dispersion plot.

    :param text: The source text
    :type text: list(str) or iter(str)
    :param words: The target words
    :type words: list of str
    :param ignore_case: flag to set if case should be ignored when searching text
    :type ignore_case: bool
    :return: a matplotlib Axes object that may still be modified before plotting
    :rtype: Axes
    """

    try:
        import matplotlib.pyplot as plt
    except ImportError as e:
        raise ImportError(
            "The plot function requires matplotlib to be installed. "
            "See https://matplotlib.org/"
        ) from e

    word2y = {
        word.casefold() if ignore_case else word: y
        for y, word in enumerate(reversed(words))  # <--- HERE
    }
    xs, ys = [], []
    for x, token in enumerate(text):
        token = token.casefold() if ignore_case else token
        y = word2y.get(token)
        if y is not None:
            xs.append(x)
            ys.append(y)

    _, ax = plt.subplots()
    ax.plot(xs, ys, "|")
    ax.set_yticks(list(range(len(words))), words, color="C0")  # <--- HERE
    ax.set_ylim(-1, len(words))
    ax.set_title(title)
    ax.set_xlabel("Word Offset")
    return ax



if __name__ == "__main__":
    import matplotlib.pyplot as plt

    from nltk.corpus import gutenberg

    words = ["Elinor", "Marianne", "Edward", "Willoughby"]
    dispersion_plot(gutenberg.words("austen-sense.txt"), words)
    plt.show()

It calculates word2y using reversed(words)

for y, word in enumerate(reversed(words))

but later it uses ax.set_yticks() using words but it should use reversed(words)

ax.set_yticks(list(range(len(words))), words, color="C0")

(or it should calculate word2y without using reversed()).

I added # <--- HERE in code above to show these places.

It may need to report it as a issue.

At this moment you can get ax and use set_yticks with reversed to correct it.
In your code it will be targets instead of words

ax = dispersion_plot(words, targets)

ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")

Full working code

import matplotlib.pyplot as plt
from nltk.draw.dispersion import dispersion_plot

words = ['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets = ['aa','bbb', 'f', 'cccc']

ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")

plt.show()

enter image description here


EDIT: I seems this problem was reported few months ago and they add reversed() in code on GitHub - and probably it will work in next version

dispersion plot not working properly · Issue #3133 · nltk/nltk

dispersion plot not working properly by Apros7 · Pull Request #3134 · nltk/nltk

Sign up to request clarification or add additional context in comments.

Comments

0

Based on @furas answer ❤️, I took it further and added an if conditional that only reverses y ticks if they are indeed broken/backwards. This means that once they fix the library bug (which is meant to be soon) the code will still work.

from nltk.draw.dispersion import dispersion_plot
targets=['a', 'b']
filtered_text = ["a","a","b"]
my_plot = dispersion_plot(filtered_text, targets, ignore_case=True)

# THIS IS NEW: if targets are wrong, fix them (reverse them)
if [label.get_text() for label in my_plot.get_yticklabels()] != reversed(targets):
    my_plot.set_yticks(list(range(len(targets))), reversed(targets))

plt.show()

(I fixed the graph library locally, and tested it with the new version, and the code works on both old-broken library, and new-fixed library) enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.