1

I'm working on a Pandas DF question and I am having trouble converting some Pandas data into a usable format to create a Scatter Plot.

Here is the code below, please let me know what I am doing wrong and how I can correct it going forward. Honest criticism is needed as I am a beginner.

# Import Data
df = pd.read_csv(filepath + 'BaltimoreData.csv')

df = df.dropna()
print(df.head(20))
# These are two categories within the data
df.plot(df['Bachelors degree'], df['Median Income'])

# Plotting the Data
df.plot(kind = 'scatter', x = 'Bachelor degree', y = 'Median Income')
df.plot(kind = 'density')
3
  • 3
    Forget the code, where's your data? Please print(df.head(20)) and post its output here. Commented Oct 22, 2017 at 23:11
  • I added the heading so you can see the first 20 lines of data. Commented Oct 23, 2017 at 22:58
  • Unfortunately, I don't have access to your computer, so I cannot load your data from your filepath. While it seems your issue was resolved this time, please look at how to provide a minimal reproducible example in the future which helps us give you better answers. Commented Oct 23, 2017 at 22:59

2 Answers 2

2

Simply plot x on y as below, where df is your dataframe and x and y are your dependent and independent variables:

import matplotlib.pyplot as plt
import pandas

plt.scatter(x=df['Bachelors degree'], y=df['Median Income'])
plt.show()
Sign up to request clarification or add additional context in comments.

2 Comments

When I run that I get the following error message: could not convert string to float: '$37,678 '
Well you've got Median Income formatted as a string - read_csv is detecting the dollar sign and assuming you're working with strings (i.e. text). You could simply change it to be formatted as a number in your CSV.
0

You can use scatter plot from pandas.

import pandas
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df.plot.scatter(x='Bachelors degree', y='Median Income');
plt.show()

1 Comment

So I made some adjustments to the code so it looks like this: df.dropna(axis = 0, how = 'any') plt.style.use('ggplot') df.plot.scatter(x = df['Bachelors degree'], y = df['Median Income']) plt.show() However it it still throwing me the error that it cannot index with vector containing NA/NaN values.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.