2

I've read a number of the questions on this site about datetime and Timestamp and matplotlib date2num, etc. However, I'm curious about what the "correct" way to plot some data is. Say I have a dataframe with the index being a Pandas DateTimeIndex. I can plot the data with pandas directly or with matplotlib:

print(dt.index)
# = DatetimeIndex(['2018-01-01 20:00:00', ..., '2018-01-03 04:00:00'],
#                 dtype='datetime64[ns]',
#                 name=u'DateTime',
#                 length=385,
#                 freq=None)

my_axis.plot(df)
print(my_axis.get_xlim())  # = (736695.72708333354, 736697.14791666681)

# vs 

df.plot(ax=my_axis)
print(my_axis.get_xlim())  # = (25247280.0, 25249200.0)

However, the range for the "x axis" is totally different between them. If I mix plotting (I need to use matplotlib directly for broken_barh), then I don't see all of the data since they have such different x coordinates. Is there an accepted best practice for this?

EDIT to add working example below

I'm open to upgrading versions if needed. I've tried with:

# Python2 Versions:
Python: 2.7.14
Numpy: 1.13.3
Pandas: 0.20.3
Matplotlib: 2.0.0

# Python3 Version (same results)
Python: 3.6.3
Numpy: 1.12.1
Pandas: 0.19.2
Matplotlib: 2.0.0

If I only use pandas to plot x and y, then both of them show up correctly. If I only use matplotlib, then they both show up correctly. However, if I try to plot one with pandas and the other with matplotlib, then they don't work (See image at bottom). My preference would be to "normally" use pandas, so that I only have to edit the DateTime index when plotting with matplotlib. I included two commented attempts at this, neither of which worked.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

start = '2018-01-02 03:00:00'
end = '2018-01-02 011:00:00'

data = pd.DataFrame({'DateTime': pd.date_range(start=start, end=end, freq='1H'),
                     'x': [1,2,3,4,5,4,3,2,1],
                     'y': [5,4,3,2,1,2,3,4,5]})
data = data.set_index('DateTime')
#print(data)

ax0 = plt.subplot(211)
ax1 = plt.subplot(212, sharex=ax0)

# Pandas for both
data['x'].plot(ax=ax0)
#data['y'].plot(ax=ax1)

# Matplotlib for both
#ax0.plot(data.index, data['x'])
ax1.plot(data.index, data['y'])

# Other attempts to make matplotlib plot work with pandas
# (but they produce same image as below)
#ax1.plot([mdates.date2num(d) for d in data.index], data['y'])
#ax1.plot(data.index.to_pydatetime(), data['y'])

plt.savefig('test.png')

test.png

5
  • 1
    No, there is no "best practice". Often using using pandas is easier while matplotlib allows for more control. In any case mixing the two will most often lead to problems, because of different date format conventions being used. Commented Jan 20, 2018 at 10:23
  • That's pretty much what I've found. I'm plotting about 4 plots and I'd prefer to just use pandas but I need to use matplotlib for one if them (broken_barh). Is there a way to get matplotlib to use the dates from pandas? Commented Jan 20, 2018 at 17:24
  • Yes. If you want to show what problem you have by using a minimal reproducible example of the issue, and clearly state which versions you have, one may help here. Commented Jan 20, 2018 at 17:44
  • Added a working example to OP Commented Jan 20, 2018 at 18:34
  • Seeing just this, after some headaches while trying to use matplotlib.date.num2date aimlessly... Interesting how the time conventions may be different. Commented Jan 10, 2019 at 13:34

1 Answer 1

3

The data units in matplotlib and pandas date plots are completely different. You may find out by not sharing any axes and printing the axis limits.

import pandas as pd
import matplotlib.pyplot as plt

start = '2018-01-02 03:00:00'
end = '2018-01-02 011:00:00'

data = pd.DataFrame({'DateTime': pd.date_range(start=start, end=end, freq='1H'),
                     'x': [1,2,3,4,5,4,3,2,1],
                     'y': [5,4,3,2,1,2,3,4,5]})
data = data.set_index('DateTime')

ax0 = plt.subplot(211)
ax1 = plt.subplot(212)

# Pandas
data['x'].plot(ax=ax0)
# Matplotlib
ax1.plot(data.index, data['y'])

print ax0.get_xlim()  # (420795.0, 420803.0)
print ax1.get_xlim()  # (736696.10833333328, 736696.47500000009)

plt.show()

It is hence clear that you cannot share the axes (sharex=ax0) if you plot on the one axis values in the range (420795.0, 420803.0) and values in the range (736696.108, 736696.475) on the other one.

So if for any reason you need to use a matplotlib plot on one of the shared axes, you need to use matplotlib for all other shared axes as well.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.