1

I currently have a wide row pandas dataframe with the schema

idx, user, task_0, task_1, ... , task_n, task_result_0, task_result_1 ..., task_result_n, some_other_attribute_0, some_other_attribute_1, ..., some_other_attribute_n

Each user was given the n tasks in a random order. For example

0, Bob, building-task, writing-task, reading-task, building-result, ...
1, Alice, writing-task, building-task, reading-task, writing-result, ...

Every attribute_n is related to each other. For example the information in task_0 is related to task_result_0.

I want to reorder the dataframe to give order to the tasks. So all rows look like:

0, Bob, building-task, writing-task, reading-task, building-result, ...
1, Alice, building-task, writing-task, reading-task, building-result, ...

I'm completely stumped on how to tackle this.

2
  • The examples you give and the description don't correlate with regards to the _0 suffix, and the notation is different (dashes vs underscores). Also, do you have column headers in your input, or are you trying to sort purely by task name on each individual line? My understanding is you wish to put all tasks first and then all results after, but the same order within each set? Commented Apr 7, 2017 at 1:56
  • I should've clarified, the column headers are the: user, task_0, task_1, ... The values don't have any common suffix. For example, a row could be Bob, a,b,0.3,0.7 and Alice, b,a 0.62, 0.95 and should be sorted to Bob, a,b,0.3,0.7 and Alice ,a, b, 0.95, 0.62 Commented Apr 7, 2017 at 19:12

1 Answer 1

1

Sort within each row and within tasks and results.

d1 = df.sort_index(1)
d1[['idx', 'user']] \
    .join(d1.filter(regex='task_\d+').apply(sorted, 1)) \
    .join(d1.filter(regex='task_result_\d+').apply(sorted, 1))

   idx   user         task_0        task_1    task_result_0   task_result_1
0    0    Bob  building-task  writing-task  building-result  writing-result
1    1  Alice  building-task  writing-task  building-result  writing-result

extra credit
However, maybe you don't assign the same tasks...
Use pd.value_counts

df.set_index(['idx', 'user']).apply(pd.value_counts, 1)

           building-task  writing-task  building-result  writing-result
idx user                                                               
0   Bob                1             1                1               1
1   Alice              1             1                1               1
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.