7

I am trying to plot a pandas.DataFrame, but getting an unexplainable ValueError. Here is sample code causing the problem:

import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
import matplotlib.dates as mdates

weekday_fmt = mdates.DateFormatter('%a %H:%M')
test_csv = 'datetime,x1,x2,x3,x4,x5,x6\n' \
           '2021-12-06 00:00:00,8,42,14,23,12,2\n' \
           '2021-12-06 00:15:00,17,86,68,86,92,45\n' \
           '2021-12-06 00:30:00,44,49,81,26,2,95\n' \
           '2021-12-06 00:45:00,35,78,33,18,80,67'
test_df = pd.read_csv(StringIO(test_csv), index_col=0)
test_df.index = pd.to_datetime(test_df.index)
plt.figure()
ax = test_df.plot()
ax.set_xlabel(f'Weekly aggregation')
ax.set_ylabel('y-label')
fig = plt.gcf()
fig.set_size_inches(12.15, 5)
ax.get_legend().remove()
ax.xaxis.set_major_formatter(weekday_fmt) # This and the following line are the ones causing the issues
ax.xaxis.set_minor_formatter(weekday_fmt)
plt.show()

If the two formatting lines are removed, the code runs through, but if I leave them in there, I get a ValueError: ValueError: Date ordinal 27312480 converts to 76749-01-12T00:00:00.000000 (using epoch 1970-01-01T00:00:00), but Matplotlib dates must be between year 0001 and 9999.

The reason seems to be that the conversion of datetime in pandas and matplotlib are incompatible. This could probably be circumvented by not using the built-in plot-function of pandas. Is there another way? Thanks!

My package versions are:

pandas                    1.3.4 
numpy                     1.19.5 
matplotlib                3.4.2 
python                    3.8.10
5
  • 1
    it looks like the issue is caused by the fact that you have only 1 date, try adding a new row with a different date ('2021-12-07 00:00:00,35,78,33,18,80,67') and it works fine. Not sure why, you should probably report this case to the matplotlib mailing list / tracker to ensure this is not a bug. Commented Jan 7, 2022 at 11:20
  • 2
    use ax = test_df.plot(x_compat=True) to enable compatibility mode for the x axis, i.e. use Python datetime instead of pandas datetime. Thereby, matplotlib can do its job and format correctly. Example from pandas docs. Commented Jan 7, 2022 at 11:28
  • 3
    You can’t mix pandas datetime conversion and matplotlibs locators or formatters. Use pandas formatters if you are going to use their plotting. Commented Jan 7, 2022 at 11:45
  • @mozway: The problem also occured when I had multiple dates. The shown csv-string was only a subsample of what I originally. Commented Jan 7, 2022 at 12:11
  • @MrFuppes: The x_compat seems to do the trick, at least in the small test-case. I will try to run it on the larger dataset now, but would expect it to work. Thanks for pointing this out. Commented Jan 7, 2022 at 12:12

1 Answer 1

10

Thanks to the comments by Jody Klymak and MrFuppes, I found the answer to simply be ax = test_df.plot(x_compat=True). For anybody stumbling upon this in future, here comes the full explanation of what is happening:

When using the plot-function, pandas takes over the formatting of x-tick (and possibly other features). The selected x-tick-values shown to matplotlib do not need to correspond with what one would expect. In the shown example, the function ax.get_xlim() returns (27312480.0, 27312525.0). Using x_compat=True forces pandas to hand the correct values over to matplotlib where the formatting then happens. Since this was not clear to me from the error message I received, this post might help future viewers searching for that error message.

Sign up to request clarification or add additional context in comments.

3 Comments

I really wonder why this option is so hard to find - pandas docs on plot don't specify it as it is not part of the plot accessor class.
I agree. Given that I work with pandas daily and matplotlib occasionally, such a fundamental issue should be more apparent, particularly regarding the error message, but I suppose that pandas cannot really send a warning at this point since that is already matplotlib domain. Question remains why pandas doesn't just use matplotlib formatting instead of creating its own world...
Achieving really nice plots with matplotlib requires a lot of fine-tuning in my experience (occasional user as well...), so I see a reason why you'd want to have a good default "pandas-style" formatter. But it should be clear from the docs how to disable the default. When I find the time, I'll grep for the kwarg through pandas src code and see if this is worth raising an issue on their github.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.