0

I'm trying to merge the two following dataframes on=SICcode:

df.head(5)

    SICcode     Catcode     Category                            SICname     MultSIC
0   111         A1500   Wheat, corn, soybeans and cash grain    Wheat        X
1   112         A1600   Other commodities (incl rice, peanuts)  Rice         X
2   115         A1500   Wheat, corn, soybeans and cash grain    Corn         X
3   116         A1500   Wheat, corn, soybeans and cash grain    Soybeans     X
4   119         A1500   Wheat, corn, soybeans and cash grain    Cash grains  X

df.columns.tolist()

['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']  

merged.head()


2012 NAICS Code     2002to2007 NAICS    SICcode
0   111110          111110               116
1   111120          111120               119
2   111130          111130               119
3   111140          111140               111
4   111150          111150               115

 merged.columns.tolist()
['2012 NAICS Code', '2002to2007 NAICS', 'SICcode']

When I try to merge them with the following code:

merged=pd.merge(merged,df, how='left', on='SICcode')    

I get a Keyerror: 'SICcode'I tried to set the dtype of One of the dfs but When I do, I receive a Keycode error.

If anyone has an idea on this or would request more information please let me know.

8
  • 1
    What's the code that's giving the error? pd.merge(df, ef, on='SICcode') should work unless you happen to have a space in the name. Commented Apr 22, 2016 at 18:55
  • 1
    Can you include the actual code which produces the error? Commented Apr 22, 2016 at 18:56
  • 1
    I think it should be merged=ef.merge(df, how='left', on='SICcode') Commented Apr 22, 2016 at 18:58
  • 1
    @MichaelPerdue, are you using/reading this data for your DF? Could you also post df.columns.tolist() for both DFs? Commented Apr 22, 2016 at 19:03
  • 1
    Hi @MichaelPerdue it seems that the columns names are different, one of them probably contain a blank space Commented Apr 22, 2016 at 19:09

1 Answer 1

2

pay attention at the first column:

In [27]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0)

In [28]: df.columns.tolist()
Out[28]: ['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']

In [29]: df['SICcode']

...

KeyError: 'SICcode'

In [30]: df['\ufeffSICcode'].head()
Out[30]:
0    111
1    112
2    115
3    116
4    119
Name: SICcode, dtype: int64

as @unutbu has said in his comment, adding encoding='utf-8_sig' to the pd.read_csv() call might help you to fix this problem:

In [31]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0, encoding='utf-8_sig')

In [32]: df.columns.tolist()
Out[32]: ['SICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
Sign up to request clarification or add additional context in comments.

1 Comment

@MaxU and unutbu: Problem Solved! Thank you both for pointing this out to me and offering a solution. I was reading the column names from the return on .head(). I had no idea that something like this could come up.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.