0

As a result of some operations, I am getting a dataframe that looks like this:

                0                                    1
0      (aut, aut)                           (1.0, 0.0)
1      (isr, pcn)   (0.0621031946211, 0.0840317734128)
2      (wlf, gum)   (0.00971778368827, 0.787082275372)
3      (lka, are)  (0.184325574632, 2.37291167033e-07)
4      (mmr, brb)  (-0.00659784629805, 0.854498462056)
5      (umi, mar)  (0.136002437743, 0.000146047773528)
6      (rwa, arm)  (0.143873473167, 5.82638804266e-05)

But I need to split this dataframe in something that looks like this:

      iso_a  iso_b     value_1               value_2
0      aut    aut        1.0                   0.0
1      isr    pcn   0.062103194621     0.0840317734128
2      wlf    gum   0.009717783688     0.787082275372
3      lka    are   0.184325574632     2.37291167033e-07
4      mmr    brb  -0.006597846298     0.854498462056
5      umi    mar   0.136002437743     0.000146047773528
6      rwa    arm   0.143873473167     5.82638804266e-05
3
  • Have you tried anything yet? Commented May 24, 2017 at 0:31
  • Are the elements of your DataFrame tuples (which is what they look like) or string representations of tuples (which they could also be)? Commented May 24, 2017 at 0:53
  • Sorry, the original data is a dictionary as follow: {('aut', 'aut'): (1.0, 0.0), ('isr', 'pcn'): (0.06210319462108603, 0.084031773412780841), ('wlf', 'gum'): (0.0097177836882651521, 0.78708227537249009), ('lka', 'are'): (0.18432557463221144, 2.3729116703293611e-07), ('mmr', 'brb'): (-0.0065978462980470038, 0.8544984620563465), ('umi', 'mar'): (0.13600243774288176, 0.00014604777352783356), ('rwa', 'arm'): (0.14387347316681087, 5.826388042658121e-05),.... Commented May 24, 2017 at 2:10

3 Answers 3

1

I might:

def x(col):
    return col[0]

df['ios_a'] = df[0].apply(x)
df['value_1'] = df[1].apply(x)


def y(col):
    return col[1]

df['ios_b'] = df[0].apply(y)
df['value_2'] = df[1].apply(y)

And then you can delete your first two columns if you like.

del df[0]
del df[1]

This is a little clumsy (not DRY) but does the job. def x(): takes the column (either column df[0] or df[1]) and then returns the first part of the tuple in each row, putting it in the new assigned column (e.g. df['iso_a']) Then def y(): does the same, but this time returns the second part of each tuple. Does that make sense? Also, this is assuming you're using Pandas dataframe.

Sign up to request clarification or add additional context in comments.

Comments

1

Since you give very (no) details on what format you need the input data read, here's a rudimentary but simple way:

ls = []
with open('del.txt', 'r') as f:
    for line in f:
        ls.append(line.replace('(', '').replace(')', '').replace(',', '').split())


for l in ls[1:]:
    print(l)

This gives a list with a sub-list for every row, with every element stored as a string:

['0', 'aut', 'aut', '1.0', '0.0']
['1', 'isr', 'pcn', '0.0621031946211', '0.0840317734128']
['2', 'wlf', 'gum', '0.00971778368827', '0.787082275372']
['3', 'lka', 'are', '0.184325574632', '2.37291167033e-07']
['4', 'mmr', 'brb', '-0.00659784629805', '0.854498462056']
['5', 'umi', 'mar', '0.136002437743', '0.000146047773528']
['6', 'rwa', 'arm', '0.143873473167', '5.82638804266e-05']

Here's another way using the translate method, which produces the same result

ls = []
with open('del.txt', 'r') as f:
    for line in f:
        ls.append(line.translate(None, "(),").split())

1 Comment

Sorry, I forgot to answer that previously: I'm getting this first dataframe as the result of some operation: for x in data2.columns: for y in data2.columns: d[x,y]= pearsonr(data2[x], data2[y]) The result is a dictionary that looks like this: {('aut', 'aut'): (1.0, 0.0), ('isr', 'pcn'): (0.06210319462108603, 0.084031773412780841), ('wlf', 'gum'): (0.0097177836882651521, 0.78708227537249009), ('lka', 'are'): (0.18432557463221144, 2.3729116703293611e-07). What I need finally... it's a dataframe as the result in the original question.
0

I am not sure if this is a input file or a multidimensional array. Let's say your input data frame is a multidimensional array where each element has another array with two elements.

def getListOfDictionaries(dataFrame):
  newList = list()
  for row in dataFrame:
    newList.append({'iso_a': row[0][0],
                    'iso_b': row[0][1],
                    'value_1': row[1][0],
                    'value_2': row[1][1]})
  return newList

As I said, I don't know in what format we can expect the input data

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.