Pandas: returning last element of column value

Question

I created the following function to retrieve data from an internal incident management system:

def get_issues(session, query):
    block_size = 50
    block_num = 0

    start = 0

    all_issues = []

    while True:
        issues = sesssion.search_issues(query, start, block_size, expand='changelog')
        if len(issues) == 0 # no more issues
            break
        start += len(issues)
        for issue in issues:
            all_issues.append(issue)

    issues = pd.DataFrame(issues)

    for issue in all_issues:

        changelog = issue.changelog
        for history in changelog.histories:
            for item in history.items:
                if item.field == 'status' and item.toString == 'Pending':
                    groups = issue.fields.customfield_02219


        d = {
            'key'        :  issue.key,
            'issue_type' :  issue.fields.issuetype,
            'creator'    :  issue.fields.creator,
            'business'   :  issue.fields.customfield_082011,
            'groups'     :  groups
             }

        fields = issue.fields

        issues = issues.append(d, ignore_index=True)

    return issues

I use this function to create a dataframe df using:

df = get_issues(the_session, the_query)

The resulting dataset looks similar to the following:

    key       issue_type       creator        business         groups
0   MED-184   incident         Smith, J       Mercedes         [Finance, Accounting, Billing]
1   MED-186   incident         Jones, M       Mercedes         [Finance, Accounting]
2   MED-187   incident         Williams, P    Mercedes         [Accounting, Sales, Executive, Tax]
3   MED-188   incident         Smith, J       BMW              [Sales, Executive, Tax, Finance]

When I call dtypes on df, I get:

key          object
issue_type   object
creator      object
business     object
groups       object

I would like to get only the last element of the groups column, such that the dataframe looks like:

    key       issue_type       creator        business         groups
0   MED-184   incident         Smith, J       Mercedes         Billing
1   MED-186   incident         Jones, M       Mercedes         Accounting
2   MED-187   incident         Williams, P    Mercedes         Tax
3   MED-188   incident         Smith, J       BMW              Finance

I tried to amend the function above, as follows:

groups = issue.fields.customfield_02219[-1]

But, I get an error that it's not possible to index into that field:

TypeError: 'NoneType' object is not subscriptable

I also tried to create another column using:

df['groups_new'] = df['groups']:[-1]

But, this returns the original groups column with all elements.

Does anyone have any ideas as to how to accomplish this?

Thanks!

########################################################

UPDATE

print(df.info()) results in the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex 13 entries, 0 to 12
Data columns (total 14 columns)

#     Column          Non-Null Count       Dtype
---   ------          -------------        -----
0     activity        7 non-null           object
1     approvals       8 non-null           object
2     business        13 non-null          object
3     created         13 non-null          object
4     creator         13 non-null          object
5     region_a        5 non-null           object
6     issue_type      13 non-null          object
7     key             13 non-null          object
8     materiality     13 non-null          object
9     region_b        5 non-null           object
10    resolution      2 non-null           object
11    resolution_time 1 non-null           object
12    target          13 non-null          object
13    region_b        5 non-null           object
types:  object(14)
memory usage:  1.5+ KB
None

pakpe · Accepted Answer · 2021-03-13 02:19:42Z

1

Here it is:

df['new_group'] = df.apply(lambda x: x['groups'][-1], axis = 1)

UPDATE: If you get an IndexError with this, it means that at least one one your lists in empty. You can try this:

df['new_group'] = df.apply(lambda x: x['groups'][-1] if x['groups'] else None, axis = 1)

EXAMPLE:

df = pd.DataFrame({'key':[121,234,147], 'groups':[[111,222,333],[34,32],[]]})
print(f'ORIGINAL DATAFRAME:\n{df}\n')

df['new_group'] = df.apply(lambda x: x['groups'][-1] if x['groups'] else None, axis = 1)
print(f'FINAL DATAFRAME:\n{df}')

#
ORIGINAL DATAFRAME:
   key           groups
0  121  [111, 222, 333]
1  234         [34, 32]
2  147               []

FINAL DATAFRAME:
   key           groups  new_group
0  121  [111, 222, 333]      333.0
1  234         [34, 32]       32.0
2  147               []        NaN

edited Mar 13, 2021 at 2:19

answered Mar 13, 2021 at 1:25

pakpe

5,4892 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

equanimity Over a year ago

This results in an IndexError: list out of range error.

pakpe Over a year ago

@equanimity Then one of the lists in the groups column must be empty. See my updated answer.

equanimity Over a year ago

Yes, @pakpe, there were a few lists that were empty. Your updated solution works perfectly. Thank you!

Vanderlei Filho · Accepted Answer · 2021-03-13 08:02:56Z

0

UPDATE: demonstration handling empty values

To get only the last element of each value (a Python list) in the 'groups' column, you can apply the following lambda to modify the 'groups' column inplace:

df['groups'] = df['groups'].apply(lambda x: x.pop() if x else None)

Working demonstration:

import pandas as pd

# Code for mocking the dataframe
data = {
    'key': ["MED-184", "MED-186", "MED-187"],
    'issue_type': ['incident', 'incident', 'incident'],
    'creator': ['Smith, J', 'Jones, M', 'Williams, P'],
    'business': ['Mercedes', 'Mercedes', 'Mercedes'],
    'groups': [['Finance', 'Accounting', 'Billing'], ['Finance', 'Accounting'], None]
}
df = pd.DataFrame.from_dict(data)

# print old dataframe:
print(df)

# Execute the line below to transform the dataframe 
# into one with only the last values in the group column.
df['groups'] = df['groups'].apply(lambda x: x.pop() if x else None)

# print new transformed dataframe:
print(df)

I hope this answer helps you.

edited Mar 13, 2021 at 8:02

answered Mar 13, 2021 at 1:14

Vanderlei Filho

3011 silver badge8 bronze badges

5 Comments

equanimity Over a year ago

The solution suggested by @Vanderlei Munhoz results in an error on the production data: AttributeError: 'NoneType' object has no attribute 'pop' (probably because it's a special object type)

Vanderlei Filho Over a year ago

That is correct, if any value inside the "groups" column isn't a List the AttributeError exception will be raised. What are the real types of the "groups" column in the production dataframe @equanimity?

equanimity Over a year ago

How would I see the actual types in an object type?

Vanderlei Filho Over a year ago

What is the output of "print(df.info())"? df being the production dataframe. @equanimity

Vanderlei Filho Over a year ago

I see the output but there is no "groups" column displayed (I guess the name is different). Check if there are any NaN/Empty values in the column with the lists--if this is the case, you will need to first handle these empty values (maybe swapping them to an empty list) before calling the .pop() method as I suggested. Please let me know if this is the case so I can update my answer accordingly. @equanimity

Collectives™ on Stack Overflow

Pandas: returning last element of column value

2 Answers 2

3 Comments

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Related