1

I have a dataframe with the following nested schema:

root
 |-- data: struct (nullable = true)
 |    |-- ac_failure: string (nullable = true)
 |    |-- ac_failure_delayed: string (nullable = true)
 |    |-- alarm_exit_error: boolean (nullable = true)
 |    |-- alarm_has_delay: string (nullable = true)
 |    |-- nodes: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- battery_status: string (nullable = true)
 |    |    |    |-- device_id: long (nullable = true)
 |    |    |    |-- device_manufacture_id: long (nullable = true)
 |    |    |    |-- device_name: string (nullable = true)
 |    |    |    |-- device_product_id: long (nullable = true)
 |    |    |    |-- device_state: string (nullable = true)
 |    |    |    |-- device_status: string (nullable = true)
 |    |    |    |-- device_supported_command_class_list: string (nullable = true)
 |    |    |    |-- device_type: string (nullable = true)
 |    |    |    |-- endpoint_id: long (nullable = true)
 |    |    |    |-- partition_id: long (nullable = true)
 |-- device_id: long (nullable = true)
 |-- device_type: string (nullable = true)
 |-- event: string (nullable = true)
 |-- event_class: string (nullable = true)
 |-- event_timestamp: long (nullable = true)
 |-- event_type: string (nullable = true)
 |-- imei: string (nullable = true)
 |-- partition_id: long (nullable = true)
 |-- source: string (nullable = true)

I want to collect the row as dictionaries. I tried:

seq = [row.asDict() for row in df2_final.collect()]

What i get with this is (sample 1 row):

    {'data': Row(ac_failure=None, ac_failure_delayed=None, alarm_exit_error=None, alarm_has_delay='true', nodes=None),
 'device_id': 2,
 'device_type': 'panel',
 'event': 'alarm_state',
 'event_class': 'panel_alarm',
 'event_timestamp': 1586921122886,
 'event_type': 'zone_alarm_perimeter',
 'imei': '9900000000000',
 'operation': 'report',
 'partition_id': 0,
 'source': 'panel'}

What can I do to get data as dict as well. eg:

{'data': {ac_failure=None, ac_failure_delayed=None, alarm_exit_error=None, alarm_has_delay='true', nodes=None},
     'device_id': 2,
     'device_type': 'panel',
     'event': 'alarm_state',
     'event_class': 'panel_alarm',
     'event_timestamp': 1586921122886,
     'event_type': 'zone_alarm_perimeter',
     'imei': '9900000000000',
     'operation': 'report',
     'partition_id': 0,
     'source': 'panel'}

I would like to have all nested columns as dict instead for pyspark.sql.types.row. TIA

2
  • 2
    use row.asDict(recursive=True) Commented Aug 27, 2020 at 23:07
  • 1
    Thanks @jxc. This works: seq = [row.asDict(recursive=True) for row in df2_final.collect()] Commented Aug 27, 2020 at 23:15

1 Answer 1

6

Thanks @jxc. This works:

seq = [row.asDict(recursive=True) for row in df2_final.collect()]
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.