1

I have a dataframe df1 containing two columns freq and RN with the data sorted according to ascending order by freq.

    In [2]: df1.head()
    Out[2]: 
       freq  RN
    147 1   181
    56  1   848
    149 1   814
    25  1   829

I want to plot a scatter plot with X axis as RN and y axis as freq where the X values are arranged in ascending order of the y values ie. I want the x axis to be arranged as 841,848,835,... as given in df1 which has been sorted according to ascending order of freq values.

Now if I write plt.scatter('RN', 'freq',data=df1) the output x axis I get is not sorted by the ascending order of freq. It is arranged in its own natural ascending order like 800,801,...,860.

Note: plt.bar('RN', 'freq',data=df1) works in the correct way as I want.

enter image description here

How Do I change it?

0

1 Answer 1

1
  • If the RN column is numeric, the plot API will sort it numerically.
  • This can be done if you set the RN column type to str.
    • This works best if the values in RN are unique. If they are not unique, all the freq values for a non-unique RN will be plotted together.
    • If RN is not unique, there's no way for the plot API to differential one value from another.

Non-Unique RN (a)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# create test data
np.random.seed(365)
data = {'freq': np.random.randint(20, size=(20,)), 'RN': np.random.randint(800, 900, size=(20,))}
df = pd.DataFrame(data)

# convert RN to a str type
df.RN = df.RN.astype(str)

# sort freq
df.sort_values('freq', ascending=True, inplace=True)

# plot
plt.scatter('RN', 'freq', data=df)

enter image description here

Non-Unique RN (b)

  • Use pandas.DataFrame.groupby to group non-unique RNs together
# create test data
# create test data
np.random.seed(365)
data = {'freq': np.random.randint(20, size=(20,)), 'RN': np.random.randint(800, 900, size=(20,))}
df = pd.DataFrame(data)

# convert RN to a str type
df.RN = df.RN.astype(str)

# combine non-unique RN with groupby and sort by freq
dfg = df.groupby('RN', as_index=False)['freq'].sum().sort_values('freq')

# plot
plt.scatter('RN', 'freq', data=dfg)

enter image description here

Unique RN

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# create test data
np.random.seed(365)
data = {'freq': np.random.randint(20, size=(20,)), 'RN': np.arange(800, 820)}
df = pd.DataFrame(data)

# convert RN to a str type
df.RN = df.RN.astype(str)

# sort `freq`
df.sort_values('freq', ascending=True, inplace=True)

# plot
plt.scatter('RN', 'freq', data=df)

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.