1

I am new to Pandas. I am trying to make a data set with ZIP Code, Population in that ZIP Code, and Number of Counties in the ZIP Code.

I get the data from census website: https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt

I am trying with the following code, but it is not working. Could you help me to figure out the correct code? I have a hunch that the error is due to data frame or sorts related to data type. But I can not work out the correct code to make it right. Please let me know your thoughts. Thank you in advance!

import pandas as pd

df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])

zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)

zcta_ct_county = df['ZCTA5'].value_counts()

zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']

pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]

Here is my error message:

Traceback (most recent call last):    
File "<stdin>", line 1, in <module>    
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 58, in merge copy=copy, indicator=indicator)   
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 473, in __init__ 'type {0}'.format(type(right)))    
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>

SOLUTION

import pandas as pd
df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])
zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)
zcta_ct_county = df['ZCTA5'].value_counts().reset_index()
zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']
pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]
2
  • If you've answered your own question, you should post the solution as an answer and mark it as "accepted" (instead of leaving it in the question itself). Commented May 24, 2017 at 15:11
  • Thanks for the reminder, @jezrael helped me to solve it. The solution is posted at the bottom of my post. And jezrael's answer is accepted below. Commented May 24, 2017 at 15:25

1 Answer 1

1

I think you need add reset_index, because output of value_counts is Series and need DataFrame with 2 columns:

zcta_ct_county = df['ZCTA5'].value_counts().reset_index()
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.