0

this is the first time i ask something here, so sorry if im doing anything wrong. I have this data in a panda dataFrame:

    Year    Month   PassengerCountSum   Date    DateOrd Prediction
0   2006    9   2720100.000 2006-09-01  732555  2815063.471
1   2007    5   3056934.000 2007-05-01  732797  2908360.055
2   2012    2   2998119.000 2012-02-01  734534  3578013.633
3   2008    4   3029021.000 2008-04-01  733133  3037895.807
4   2006    10  2834959.000 2006-10-01  732585  2826629.163
... ... ... ... ... ... ...
124 2007    7   3382382.000 2007-07-01  732858  2931876.962
125 2009    6   3419595.000 2009-06-01  733559  3202128.637
126 2012    9   3819379.000 2012-09-01  734747  3660130.047
127 2013    10  3910790.000 2013-10-01  735142  3812411.661
128 2011    6   3766323.000 2011-06-01  734289  3483560.480

I need to make a graph with the Date in the X axis and PassengerCountSum in the Y axis. Also i need to show the values of the Prediction in a linear regresion.

there is no problem when i do this:

plt.plot(df_pass_by_year_pd['Date'] , df_pass_by_year_pd['Prediction'])

It paints a perfect linear regression.

But when I change the df_pass_by_year_pd['Prediction']) for df_pass_by_year_pd['PassengerCountSum']) to show the real values of the dataFrame like this :

plt.plot(df_pass_by_year_pd['Date'] , df_pass_by_year_pd['PassengerCountSum'])

The graph goes crazy and paint things I dont really understand.

plot

Someone sees the problem? Ty all!

I have tried to change type of the column and reshape the array but im pretty new to all of this so any help or tip is welcome

2
  • The points are connected in the order they are encountered in the dataframe. As all the points of predicition lie on one line, you don't see the wiggling. You can sort the dateframe on Date to obtain a plot going left to right. Commented Dec 28, 2022 at 14:03
  • That was the problem, as @chrslg pointed too. Sorted the DataFrame and everything looks fine now. Than you very much. Commented Dec 28, 2022 at 14:30

1 Answer 1

1

Your data are not sorted. So it draws line between each pair of subsequents (x,y), (x',y').

You had the same problem with prediction also. You believe you see one straigth line, but in reality what you saw is a myriad of straight lines superposed. But since your prediction are perfectly aligned, you didn't saw it.

Mitigation: sort your data by date before ploting.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh my, i feel pretty ashamed now, it looks so obvious now hahaha. thank you. That was the problem. Sorted the DataFrame and everything looks fine now

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.