Skip to main content
edited tags; edited title
Link
palacsint
  • 30.4k
  • 9
  • 82
  • 157

feedback on a text Text parser implemented as a generator

Tweeted twitter.com/#!/StackCodeReview/status/165260764091916288
Post Migrated Here from stackoverflow.com (revisions)
Source Link
max
  • 205
  • 1
  • 4

feedback on a text parser implemented as a generator

I often need to parse tab-separated text (usually from a huge file) into records. I wrote a generator to do that for me; is there anything that could be improved in it, in terms of performance, extensibility or generality?

def table_parser(in_stream, types = None, sep = '\t', endl = '\n', comment = None):
  header = next(in_stream).rstrip(endl).split(sep)
  for lineno, line in enumerate(in_stream):
    if line == endl:
      continue # ignore blank lines
    if line[0] == comment:
      continue # ignore comments
    fields = line.rstrip(endl).split(sep)
    try:
      # could have done this outside the loop instead:
      # if types is None: types = {c : (lambda x : x) for c in headers}
      # but it nearly doubles the run-time if types actually is None
      if types is None:
        record = {col : fields[no] for no, col in enumerate(header)}
      else:
        record = {col : types[col](fields[no]) for no, col in enumerate(header)}
    except IndexError:
      print('Insufficient columns in line #{}:\n{}'.format(lineno, line))
      raise
    yield record