Split dataframe into several columns, Python

Question

As a result of some operations, I am getting a dataframe that looks like this:

                0                                    1
0      (aut, aut)                           (1.0, 0.0)
1      (isr, pcn)   (0.0621031946211, 0.0840317734128)
2      (wlf, gum)   (0.00971778368827, 0.787082275372)
3      (lka, are)  (0.184325574632, 2.37291167033e-07)
4      (mmr, brb)  (-0.00659784629805, 0.854498462056)
5      (umi, mar)  (0.136002437743, 0.000146047773528)
6      (rwa, arm)  (0.143873473167, 5.82638804266e-05)

But I need to split this dataframe in something that looks like this:

      iso_a  iso_b     value_1               value_2
0      aut    aut        1.0                   0.0
1      isr    pcn   0.062103194621     0.0840317734128
2      wlf    gum   0.009717783688     0.787082275372
3      lka    are   0.184325574632     2.37291167033e-07
4      mmr    brb  -0.006597846298     0.854498462056
5      umi    mar   0.136002437743     0.000146047773528
6      rwa    arm   0.143873473167     5.82638804266e-05

Are the elements of your DataFrame tuples (which is what they look like) or string representations of tuples (which they could also be)? — DSM
– DSM, Commented May 24, 2017 at 0:53
Sorry, the original data is a dictionary as follow: {('aut', 'aut'): (1.0, 0.0), ('isr', 'pcn'): (0.06210319462108603, 0.084031773412780841), ('wlf', 'gum'): (0.0097177836882651521, 0.78708227537249009), ('lka', 'are'): (0.18432557463221144, 2.3729116703293611e-07), ('mmr', 'brb'): (-0.0065978462980470038, 0.8544984620563465), ('umi', 'mar'): (0.13600243774288176, 0.00014604777352783356), ('rwa', 'arm'): (0.14387347316681087, 5.826388042658121e-05),.... — PAstudilloE
– PAstudilloE, Commented May 24, 2017 at 2:10

StarkJA · Accepted Answer · 2017-05-24 02:26:24Z

I might:

def x(col):
    return col[0]

df['ios_a'] = df[0].apply(x)
df['value_1'] = df[1].apply(x)


def y(col):
    return col[1]

df['ios_b'] = df[0].apply(y)
df['value_2'] = df[1].apply(y)

And then you can delete your first two columns if you like.

del df[0]
del df[1]

This is a little clumsy (not DRY) but does the job. def x(): takes the column (either column df[0] or df[1]) and then returns the first part of the tuple in each row, putting it in the new assigned column (e.g. df['iso_a']) Then def y(): does the same, but this time returns the second part of each tuple. Does that make sense? Also, this is assuming you're using Pandas dataframe.

Gabriel · Accepted Answer · 2017-05-24 00:49:22Z

Since you give very (no) details on what format you need the input data read, here's a rudimentary but simple way:

ls = []
with open('del.txt', 'r') as f:
    for line in f:
        ls.append(line.replace('(', '').replace(')', '').replace(',', '').split())


for l in ls[1:]:
    print(l)

This gives a list with a sub-list for every row, with every element stored as a string:

['0', 'aut', 'aut', '1.0', '0.0']
['1', 'isr', 'pcn', '0.0621031946211', '0.0840317734128']
['2', 'wlf', 'gum', '0.00971778368827', '0.787082275372']
['3', 'lka', 'are', '0.184325574632', '2.37291167033e-07']
['4', 'mmr', 'brb', '-0.00659784629805', '0.854498462056']
['5', 'umi', 'mar', '0.136002437743', '0.000146047773528']
['6', 'rwa', 'arm', '0.143873473167', '5.82638804266e-05']

Here's another way using the translate method, which produces the same result

ls = []
with open('del.txt', 'r') as f:
    for line in f:
        ls.append(line.translate(None, "(),").split())

Sorry, I forgot to answer that previously: I'm getting this first dataframe as the result of some operation: for x in data2.columns: for y in data2.columns: d[x,y]= pearsonr(data2[x], data2[y]) The result is a dictionary that looks like this: {('aut', 'aut'): (1.0, 0.0), ('isr', 'pcn'): (0.06210319462108603, 0.084031773412780841), ('wlf', 'gum'): (0.0097177836882651521, 0.78708227537249009), ('lka', 'are'): (0.18432557463221144, 2.3729116703293611e-07). What I need finally... it's a dataframe as the result in the original question.

criskross · Accepted Answer · 2017-05-24 00:51:22Z

I am not sure if this is a input file or a multidimensional array. Let's say your input data frame is a multidimensional array where each element has another array with two elements.

def getListOfDictionaries(dataFrame):
  newList = list()
  for row in dataFrame:
    newList.append({'iso_a': row[0][0],
                    'iso_b': row[0][1],
                    'value_1': row[1][0],
                    'value_2': row[1][1]})
  return newList

As I said, I don't know in what format we can expect the input data

Collectives™ on Stack Overflow

Split dataframe into several columns, Python

3 Answers 3

Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Related