feedback on a text Text parser implemented as a generator

Tweeted twitter.com/#!/StackCodeReview/status/165260764091916288

occurred Feb 3, 2012 at 2:30

Post Migrated Here from stackoverflow.com (revisions)

occurred Feb 2, 2012 at 14:59

Source Link

asked Feb 2, 2012 at 9:40

max

205
1
4

feedback on a text parser implemented as a generator

I often need to parse tab-separated text (usually from a huge file) into records. I wrote a generator to do that for me; is there anything that could be improved in it, in terms of performance, extensibility or generality?

def table_parser(in_stream, types = None, sep = '\t', endl = '\n', comment = None):
  header = next(in_stream).rstrip(endl).split(sep)
  for lineno, line in enumerate(in_stream):
    if line == endl:
      continue # ignore blank lines
    if line[0] == comment:
      continue # ignore comments
    fields = line.rstrip(endl).split(sep)
    try:
      # could have done this outside the loop instead:
      # if types is None: types = {c : (lambda x : x) for c in headers}
      # but it nearly doubles the run-time if types actually is None
      if types is None:
        record = {col : fields[no] for no, col in enumerate(header)}
      else:
        record = {col : types[col](fields[no]) for no, col in enumerate(header)}
    except IndexError:
      print('Insufficient columns in line #{}:\n{}'.format(lineno, line))
      raise
    yield record

python parsing generator

Stack Exchange Network

Return to Question

feedback on a text Text parser implemented as a generator

feedback on a text parser implemented as a generator