Return to Answer

added 17 characters in body

Source Link

edited Mar 30, 2017 at 13:31

ChatterOne

2.9k
12
18

Just a few quick considerations:

You have import os twice
You are not using matplotlib and numpy, so the imports can go
The line tweet = tweets[0] is useless
You're not closing the files you open, you should use the with keyword

Two optimizations:

I'd remove the print(file). This is probably single best optimization you can do
You're already looping once, why do you loop another five times?

What about something like this (not tested!):

from collections import defaultdict

elements_keys = ['ids', 'text', 'lang', 'geo', 'place']
elements = defaultdict(list)

for dirs, subdirs, files in os.walk('/Users/mymac/Documents/Dir'):
    for file in files:
        if file.endswith('.json'):
            with open(file, 'r') as input_file:
                for line in input_file:
                    try:
                        tweet = json.loads(line)
                        for key in elements_keys:
                            elements[key].append(tweet[key])
                    except:
                        continue

df=pd.DataFrame({'Ids': pd.Index(elements['id']),
                 'Text': pd.Index(elements['text']),
                 'Lang': pd.Index(elements['lang']),
                 'Geo': pd.Index(elements['geo']),
                 'Place': pd.Index(elements['place'])})
df

Just a few quick considerations:

You have import os twice
You are not using matplotlib and numpy, so the imports can go
The line tweet = tweets[0] is useless
You're not closing the files you open, you should use the with keyword

Two optimizations:

I'd remove the print(file). This is probably single best optimization you can do
You're already looping once, why do you loop another five times?

What about something like this (not tested!):

import defaultdict

elements_keys = ['ids', 'text', 'lang', 'geo', 'place']
elements = defaultdict(list)

for dirs, subdirs, files in os.walk('/Users/mymac/Documents/Dir'):
    for file in files:
        if file.endswith('.json'):
            with open(file, 'r') as input_file:
                for line in input_file:
                    try:
                        tweet = json.loads(line)
                        for key in elements_keys:
                            elements[key].append(tweet[key])
                    except:
                        continue

df=pd.DataFrame({'Ids': pd.Index(elements['id']),
                 'Text': pd.Index(elements['text']),
                 'Lang': pd.Index(elements['lang']),
                 'Geo': pd.Index(elements['geo']),
                 'Place': pd.Index(elements['place'])})
df

Just a few quick considerations:

You have import os twice
You are not using matplotlib and numpy, so the imports can go
The line tweet = tweets[0] is useless
You're not closing the files you open, you should use the with keyword

Two optimizations:

I'd remove the print(file). This is probably single best optimization you can do
You're already looping once, why do you loop another five times?

What about something like this (not tested!):

from collections import defaultdict

elements_keys = ['ids', 'text', 'lang', 'geo', 'place']
elements = defaultdict(list)

for dirs, subdirs, files in os.walk('/Users/mymac/Documents/Dir'):
    for file in files:
        if file.endswith('.json'):
            with open(file, 'r') as input_file:
                for line in input_file:
                    try:
                        tweet = json.loads(line)
                        for key in elements_keys:
                            elements[key].append(tweet[key])
                    except:
                        continue

df=pd.DataFrame({'Ids': pd.Index(elements['id']),
                 'Text': pd.Index(elements['text']),
                 'Lang': pd.Index(elements['lang']),
                 'Geo': pd.Index(elements['geo']),
                 'Place': pd.Index(elements['place'])})
df

Source Link

answered Mar 30, 2017 at 7:03

ChatterOne

2.9k
12
18

Just a few quick considerations:

You have import os twice
You are not using matplotlib and numpy, so the imports can go
The line tweet = tweets[0] is useless
You're not closing the files you open, you should use the with keyword

Two optimizations:

I'd remove the print(file). This is probably single best optimization you can do
You're already looping once, why do you loop another five times?

What about something like this (not tested!):

import defaultdict

elements_keys = ['ids', 'text', 'lang', 'geo', 'place']
elements = defaultdict(list)

for dirs, subdirs, files in os.walk('/Users/mymac/Documents/Dir'):
    for file in files:
        if file.endswith('.json'):
            with open(file, 'r') as input_file:
                for line in input_file:
                    try:
                        tweet = json.loads(line)
                        for key in elements_keys:
                            elements[key].append(tweet[key])
                    except:
                        continue

df=pd.DataFrame({'Ids': pd.Index(elements['id']),
                 'Text': pd.Index(elements['text']),
                 'Lang': pd.Index(elements['lang']),
                 'Geo': pd.Index(elements['geo']),
                 'Place': pd.Index(elements['place'])})
df