Performance tips:
ujsoncan bring more speed- since both
simplejsonandxlrdare pure-python, you may get performance improvements "for free" by switching toPyPy - you may (or not) see speed and memory usage improvements if switching to
openpyxland especially in the "read-only" mode - in the
excel_to_jsonfunction, you are accessing the same values fromrow_valuesby index multiple times. Defining intermediate variables (e.g. definingname = row_values[6]and usingnamevariable later on) and avoiding accessing an element by index more than once might have a positive impact - I'm not sure I completely understand the inner
for r in range(1, mw.nrows)loop. Can youbreakonce you get theif row_values[0] == rowe[0]evaluated toTrue? - are you sure you need the
OrderedDictand cannot get away with a regulardict? (there is a serious overheadserious overhead for CPythons prior to 3.6) - instead of
.dumps()and a separate function to dump a JSON string to a file - use.dump()method to dump to a file directly - make sure to usewithcontext manager when opening a file
Code Style notes:
- follow PEP8 guidelines in terms of whitespace usage in expressions and statements
- properly organize imports
if row_values[6]=="":can be simplified toif not row_values[6]:(similar for some other if conditions later on)- the
generate_json()call should be put into theif __name__ == '__main__':if __name__ == '__main__':to avoid it being executed on import - the
excel_to_json()function is not quite easy to grasp - see if you can add a helpful docstring and/or comments to improve on clarity and readability
Other notes:
- improve variable naming. Variables like
sh,mw,roweare very close to being meaningless. I would also replacewbwith a more explicitworkbook - have you considered using
pandas.read_excel()to read the contents into the dataframe and then dumping it via.to_json()(after applying the desired transformations)?