3

I'm trying to perform some text analysis on a pandas dataframe, but am having some trouble with the flow. Alternatively, maybe I just not getting it... PS - I'm a python beginner-ish.

Dataframe example:

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})


     Document   Type
0    a          7
1    1          E
2    a          Y
3    6          6
4    7          C
5    N          9

I'm trying to build a flow that if 'Document' or 'Type' is a number or not, do something.

Here is a simple function to return whether 'Document' is a number (edited to show how I am trying some if/then flow on the field):

def fn(dfname):
    if dfname['Document'].apply(str.isdigit):
        dfname['Check'] = 'Y'
    else:
        dfname['Check'] = 'N'

Now, I apply it to the dataframe:

df.apply(fn(df), axis=0)

I get this error back:

TypeError: ("'NoneType' object is not callable", u'occurred at index Document')

From the error message, it looks that I am not handling the index correctly. Can anyone see where I am going wrong?

Lastly - this may or may not be related to the issue, but I am really struggling with how indexes work in pandas. I think I have run into more issues with the index than any other issue.

1
  • You should use bool rather than 'Y' and 'N'... ! Commented Jan 21, 2014 at 22:58

2 Answers 2

7

You're close.

The thing you have to realize about apply is you need to write functions that operate on scalar values and return the result that you want. With that in mind:

import pandas as pd

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})

def fn(val):
    if str(val).isdigit():
        return 'Y'
    else:
        return 'N'

df['check'] = df['Document'].apply(fn)

gives me:

  Document Type check
0        a    7     N
1        1    E     Y
2        a    Y     N
3        6    6     Y
4        7    C     Y
5        N    9     N

Edit:

Just want to clarify that when using apply on a series, you should write function that accept scalar values. When using apply on a DataFrame, however, the functions should accept either full columns (when axis=0 -- the default) or full rows (when axis=1).

Sign up to request clarification or add additional context in comments.

2 Comments

OK - I think I just figured it out - to use functions on a dataframe, you have to use (should use) apply. So, I can chain together functions by using apply inside of the main function. Is that right (does that make sense)?
@mikebmassey that sounds possible in theory, but it also sounds like a mess. i would avoid that situation.
3

It's worth noting that you can do this (without using apply, so more efficiently) using str.contains:

In [11]: df['Document'].str.contains('^\d+$')
Out[11]: 
0    False
1     True
2    False
3     True
4     True
5    False
Name: Document, dtype: bool

Here the regex ^ and $ mean start and end respectively.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.