How to plot two columns of a pandas data frame using points

Question

I have a pandas dataframe and would like to plot values from one column versus the values from another column. Fortunately, there is plot method associated with the dataframes that seems to do what I need:

df.plot(x='col_name_1', y='col_name_2')

Unfortunately, it looks like among the plot styles (listed here after the kind parameter), there are not points. I can use lines or bars or even density but not points. Is there a work around that can help to solve this problem?

sodd · Accepted Answer · 2016-04-04 12:34:53Z

153

You can specify the style of the plotted line when calling df.plot:

df.plot(x='col_name_1', y='col_name_2', style='o')

The style argument can also be a dict or list, e.g.:

import numpy as np
import pandas as pd

d = {'one' : np.random.rand(10),
     'two' : np.random.rand(10)}

df = pd.DataFrame(d)

df.plot(style=['o','rx'])

All the accepted style formats are listed in the documentation of matplotlib.pyplot.plot.

Output

edited Apr 4, 2016 at 12:34

answered Jul 23, 2013 at 14:33

sodd

13k4 gold badges59 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dan Over a year ago

If you're not able to see multiple column lines / points, then check the dtypes of the dataframe and convert it from object to numeric columns. This issue wasted at least a few hours for me.

Community · Accepted Answer · 2017-05-23 12:34:47Z

For this (and most plotting) I would not rely on the Pandas wrappers to matplotlib. Instead, just use matplotlib directly:

import matplotlib.pyplot as plt
plt.scatter(df['col_name_1'], df['col_name_2'])
plt.show() # Depending on whether you use IPython or interactive mode, etc.

and remember that you can access a NumPy array of the column's values with df.col_name_1.values for example.

I ran into trouble using this with Pandas default plotting in the case of a column of Timestamp values with millisecond precision. In trying to convert the objects to datetime64 type, I also discovered a nasty issue: < Pandas gives incorrect result when asking if Timestamp column values have attr astype >.

Dr. Arslan · Accepted Answer · 2019-07-03 17:12:29Z

Pandas uses matplotlib as a library for basic plots. The easiest way in your case will using the following:

import pandas as pd
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
df.plot(x='col_name_1', y='col_name_2', style='o')

However, I would recommend to use seaborn as an alternative solution if you want have more customized plots while not going into the basic level of matplotlib. In this case you the solution will be following:

import pandas as pd
import seaborn as sns
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
sns.scatterplot(x="col_name_1", y="col_name_2", data=df)

shantanu pathak · Accepted Answer · 2019-09-20 17:25:09Z

2

Now in latest pandas you can directly use df.plot.scatter function

df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
                   [6.4, 3.2, 1], [5.9, 3.0, 2]],
                  columns=['length', 'width', 'species'])
ax1 = df.plot.scatter(x='length',
                      y='width',
                      c='DarkBlue')

https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.scatter.html

answered Sep 20, 2019 at 17:25

shantanu pathak

2,20721 silver badges28 bronze badges

Comments

cottontail · Accepted Answer · 2024-04-04 07:26:29Z

When this question was posted, scatter plot was a separate function in pandas. Since pandas 0.13, you can use kind='scatter' to plot a scatter plot from two columns.

df = pd.DataFrame({'colA': np.random.rand(10), 'colB': np.random.rand(10)})
df.plot(x='colA', y='colB', kind='scatter')

If you want to change the marker (e.g. x), then you can use marker= parameter:

df.plot(x='colA', y='colB', kind='scatter', marker='x')

How is this different from `df.plot(style='o')`?

Under the hood, df.plot defaults to a matplotlib line plot, (i.e. Axes.plot() or plt.plot), so passing style= is similar to plt.plot(x, y, 'o'). In particular, this creates an Axes.lines object which stores marker attributes.

On the other hand, df.plot(kind='scatter') (or df.plot.scatter) uses Axes.scatter of matplotlib; this creates an Axes.collections object to store marker attributes.

One significant difference is when you want to change marker size; with kind='scatter', you have to use s= like plt.scatter but with a line plot, you have to use ms= instead. The following two function calls produce the same output.

df.plot(x='colA', y='colB', kind='scatter', s=36)  # case 1
df.plot(x='colA', y='colB', style='o', ms=6)       # case 2

You can read this Q/A about why marker size values should be different in order for these methods to produce the same output.

Collectives™ on Stack Overflow

How to plot two columns of a pandas data frame using points

5 Answers 5

1 Comment

Comments

Comments

Comments

How is this different from `df.plot(style='o')`?

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

How is this different from df.plot(style='o')?

Comments

Linked

Related

How is this different from `df.plot(style='o')`?