2

I have a dataset which I want to create a new column that is based on a division of two other columns using a for-loop with if-conditions.

This is the dataset, with the empty 'solo_fare' column created beforehand.

The task is to loop through each row and divide 'Fare' by 'relatives' to get the per-passenger fare. However, there are certain if-conditions to follow (passengers in this category should see per-passenger prices of between 3 and 8)

enter image description here

The code I have tried here doesn't seem to fill in the 'solo_fare' rows at all. It returns an empty column (same as above df).

for i in range(0, len(fare_result)):
    p = fare_result.iloc[i]['Fare']/fare_result.iloc[i]['relatives']
    q = fare_result.iloc[i]['Fare']
    r = fare_result.iloc[i]['relatives']
    
    # if relatives == 0, return original Fare amount
    if (r == 0):
        fare_result.iloc[i]['solo_fare'] = q
    # if the divided fare is below 3 or more than 8, return original Fare amount again
    elif (p < 3) or (p > 8):
        fare_result.iloc[i]['solo_fare'] = q
    # else, return the divided fare to get solo_fare
    else:
        fare_result.iloc[i]['solo_fare'] = p 

How can I get this to work?

1
  • Did you get the : "A value is trying to be set on a copy of a slice from a DataFrame", error? Commented Apr 25, 2022 at 7:15

3 Answers 3

4

You should probably not use a loop for this but instead just use loc

if you first create the 'solo fare' column and give every row the default value from Fare you can then change the value for the conditions you have set out

fare_result['solo_fare'] = fare_result['Fare']

fare_results.loc[(
    (fare_results.Fare / fare_results.relatives) >= 3) & (
    (fare_results.Fare / fare_results.relatives) <= 8), 'solo_fare'] = (
        fare_results.Fare / fare_results.relatives)
Sign up to request clarification or add additional context in comments.

2 Comments

This does work. Out of curiousity, was there a reason you can spot why my forloop method did not work?
the issue is with how you are setting the values to the solo_fare column if you replaces fare_result.iloc[i]['solo_fare'] = p with fare_result.loc[i, 'solo_fare'] = p it should work but using a loop for this process is bad practice and relatively slow
0

Did you try to initialize those new colums first ?

By that I mean that the statement fare_result.iloc[i]['solo_fare'] = q only means that you are assigning the value q to the field solo_fare of the line i

The issue there is that at this moment, the line i does not have any solo_fare key. Hence, you are only filling the last value of your table here.

To solve this issue, try declaring the solo_fare column before the for loop like:

fare_result['solo_fare'] = np.nan

1 Comment

Yes, the solo_fare column was already initialized beforehand. The df image in the question is the starting point after initializing. Tried also with = np.nan with same issue
0

One way to do is to define a row-wise function, and apply it to the dataframe:

# row-wise function (mockup)
def foo(fare, relative):
    # your logic here. Mine just serves as example
    if relative > 100:
        res = fare/relative
    elif (relative < 10):
        res = fare
    else:
        res = 10
    return res

Then apply it to the dataframe (row-wise):

fare_result['solo_fare'] = fare_result.apply(lambda row: foo(row['Fare'], row['relatives']) , axis=1)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.