1

Let's say I have the following simple situation:

import pandas as pd

def multiply(row):
    global results
    results.append(row[0] * row[1])

def main():
    results = []
    df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}])
    df.apply(multiply, axis=1)
    print(results)

if __name__ == '__main__':
    main()

This results in the following traceback:

Traceback (most recent call last):

  File "<ipython-input-2-58ca95c5b364>", line 1, in <module>
    main()

  File "<ipython-input-1-9bb1bda9e141>", line 11, in main
    df.apply(multiply, axis=1)

  File "C:\Users\bbritten\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
    ignore_failures=ignore_failures)

  File "C:\Users\bbritten\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
    results[i] = func(v)

  File "<ipython-input-1-9bb1bda9e141>", line 5, in multiply
    results.append(row[0] * row[1])

NameError: ("name 'results' is not defined", 'occurred at index 0')

I know that I can move results = [] to the if statement to get this example to work, but is there a way to keep the structure I have now and make it work?

2 Answers 2

4

You must declare results outside the functions like:

import pandas as pd

results = []

def multiply(row):
    # the rest of your code...

UPDATE

Also note that list in python is mutable, hence you don't need to specify it with global in the beginning of the functions. Example

def multiply(row):
    # global results -> This is not necessary!
    results.append(row[0] * row[1])
Sign up to request clarification or add additional context in comments.

5 Comments

So the variables must be truly global in nature, and cannot be defined in a local scope, such as what I was doing. Is that what you're saying? I'm likely not going to use the right terminology here, but is there a way to access the environment of a calling function? So, for example, can I specify that multiply should use the environment of main for variables?
This still won't fix your problem.
Yes. If you want a global variable you must declare it outside of the scope of any function, the global keyword just says that variable you're manipulating is actually the global one rather than a new one created locally inside this function. If you wan't to make a variable "environment specific" you should use classes then.
Understood. Thanks for the help!
@cᴏʟᴅsᴘᴇᴇᴅ Technically this answer does fix the problem. It fixes the TraceBack the OP posted in their question. However as you noted in your (now deleted) answer, if the OP wants to use multiply with df.apply(), they need to return an actual value.
0

You must move results outside of the function. I don't think there is any other way without moving the variable out.

One way is to pass results as a parameter to multiply method.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.