2

I have JSON of the form:

{
    "abc":
      {
        "123":[45600,null,3567],
        "378":[78689,2345,5678],
        "343":[23456,null,null]
      }
    }

I have got json data from a url using below way:

json_data = json.loads(url.read().decode())

I need to convert it into Python Pandas Dataframe as below:

ds    y_ds1  y_ds2  y_ds2
123   45600  null   3567
378   78689  2345   5678
343   23456  null   null

I'm trying to do this way :

df = pd.read_json(url,orient='columns')

It gives result in following form:

          abc
123      [45600,null,3567]
378      [78689,2345,5678]
343      [23456,null,null]

Any way by which I can simply split my second column of array into no. of columns present in the array and rename the headings as mentioned above?

Edits: According to the given json, there are 3 elements in all the arrays; what if the array size is 2 or 4 or 5 for all the items. Note: JSON will have all keys with similar sized arrays. But size of array can be anything.

0

1 Answer 1

2

Does this work for you?

import pandas as pd
import numpy as np

null = np.nan

my_json = {
    "abc":
      {
        "123":[45600,null,3567],
        "378":[78689,2345,5678],
        "343":[23456,null,null]
      }
    }

pd.DataFrame(my_json.get('abc')).T.reset_index().rename(columns={'index':'ds',0:'y_ds1',1:'y_ds2',2:'y_ds3'})

    ds    y_ds1   y_ds2   y_ds2
0  123  45600.0     NaN  3567.0
1  343  23456.0     NaN     NaN
2  378  78689.0  2345.0  5678.0

If the index column can remain the ds column than you can do this:

pd.DataFrame(my_json.get('abc')).T.rename(columns=(lambda x: 'y_ds' + str(x)))

       y_ds0   y_ds1   y_ds2
123  45600.0     NaN  3567.0
343  23456.0     NaN     NaN
378  78689.0  2345.0  5678.0

Edit: Given the DF you presented in your edit you can convert it as so:

temp = df['abc'].apply(lambda x: pd.Series(x)).rename(columns=(lambda x: :'y_ds'+str(x)))

temp

       y_ds0   y_ds1   y_ds2
123  45600.0     NaN  3567.0
378  78689.0  2345.0  5678.0
343  23456.0     NaN     NaN
Sign up to request clarification or add additional context in comments.

7 Comments

please check the edit, if you can suggest something.
I think your last one will work, but index column with values 123,378 etc should be included with heading ds
just do temp.index.name = 'ds'
What I meant is that index column should be considered as part of dataframe with heading ds. Even if I do "temp.index.name = 'ds'"; no. of columns in dataframe is 3 not 4. It should be 4 columns.
ok just do temp.reset_index() after that and you're all set.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.