2

I need to read some JSON data for processing. I have a single line file that has multiple JSON objects how can I parse this?

I want the output to be a file with a single line per object.

I have tried a brute force method that will use json.loads recursively to check if the json is valid but I'm getting different results every time I run the program

import json

with open('sample.json') as inp:
s = inp.read()

jsons = []

start, end = s.find('{'), s.find('}')
while True:
 try:
    jsons.append(json.loads(s[start:end + 1]))
    print(jsons)
except ValueError:
    end = end + 1 + s[end + 1:].find('}')
else:
    s = s[end + 1:]
    if not s:
        break
    start, end = s.find('{'), s.find('}')

for x  in jsons:
  writeToFilee(x)

The json format can be seen here https://pastebin.com/DgbyjAG9

9
  • Paste a sample of your file along with how you'd like to have the output. Commented Apr 9, 2019 at 12:49
  • You want to replace the taxi_group_id with what? Commented Apr 9, 2019 at 12:50
  • I want to split the single line file containing multiple objects to a multiple line file containing an object on each line Commented Apr 9, 2019 at 12:53
  • @Jessica are these objects delimited somehow? Or is it just like {...}{...}? I found only 1 occurrence of "}\s*{" regex in the paste you provided, am I right to assume this file contains 2 different JSON objects, or are there more? Commented Apr 9, 2019 at 13:00
  • 1
    how about jsons = s.replace('}{', '}|{').split('|') to create a list of json strings? Commented Apr 9, 2019 at 13:07

3 Answers 3

4

why not just use the pos attribute of the JSONDecodeError to tell you where to delimit things?

something like:

import json

def json_load_all(buf):
    while True:
        try:
            yield json.loads(buf)
        except json.JSONDecodeError as err:
            yield json.loads(buf[:err.pos])
            buf = buf[err.pos:]
        else:
            break

works with your demo data as:

with open('data.json') as fd:
    arr = list(json_load_all(fd.read()))

gives me exactly two elements, but I presume you have more?

to complete this using the standard library, writing out would look something like:

with open('data.json') as inp, open('out.json', 'w') as out:
    for obj in json_load_all(inp.read()):
        json.dump(obj, out)
        print(file=out)

otherwise the jsonlines package is good for dealing with this data format

Sign up to request clarification or add additional context in comments.

Comments

1

The code below worked for me:

import json
with open(input_file_path) as f_in: 
    file_data = f_in.read() 
    file_data = file_data.replace("}{", "},{") 
    file_data = "[" + file_data + "]"
    data = json.loads(file_data)

Comments

0

Following @Chris A's comment, I've prepared this snippet which should work just fine:

with open('my_jsons.file') as file:
    json_string = file.read()

json_objects = re.sub('}\s*{', '}|!|{', json_string).split('|!|')
# replace |!| with whatever suits you best

for json_object in json_objects:
    print(json.loads(obj))

This example, however, will become worthless as soon as '}{' string appears in some value inside your JSON, so I strongly recommend using @Sam Mason's solution

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.