I still do not understand your data format. You only answered about half of my questions, so I'm going to go out on a limb a bit.
Some suggestions for you:
- Avoid global code
- Make constants capitalized
- The standard terminology for the opposite of "header" is "footer", not "trailer"
- Since you did not provide any indication of scale, I offer a simple, pure-Python implementation without a whole lot of regard to performance.
- The parsing of the serialized file format is shown in a separate generator function from the loading of the data into the dictionary format you've shown
- I have assumed that you wish to remain printing the dictionary out to
stdout, in which casepprintis more appropriate. If you want to serialize this to JSON, that is trivial using thejsonmodule. - I have assumed that in the case of repeated groups, the last one wins and overwrites any former entries for the same group
The suggested code:
from pprint import pprint
from typing import Iterable, List, Sequence
HEADER_NAMES = ('HeaderKey1', 'HeaderKey2', 'HeaderKey3')
FOOTER_NAMES = ('FootKey1', 'FootKey2', 'FootKey3')
GROUPS = {'A': ('A1ValueKey', 'A2ValueKey', 'A3ValueKey'),
'B': ('B1ValueKey', 'B2ValueKey', 'B3ValueKey')}
def parse(fn: str) -> Iterable[List[str]]:
with open(fn) as f:
yield from (
line.rstrip().split('|')
for line in f
)
def load(lines: Iterable[Sequence[str]]) -> dict:
lines = iter(lines)
heads = next(lines)
prev_line = next(lines)
groups = {}
for line in lines:
group, *entries = prev_line
groups[group] = {
k: e
for k, e in zip(GROUPS[group], entries)
}
prev_line = line
return {
'header': {k: h for k, h in zip(HEADER_NAMES, heads)},
'footer': {k: f for k, f in zip(FOOTER_NAMES, prev_line)},
'groups': groups,
}
if __name__ == '__main__':
d = load(parse('file1.usr'))
pprint(d)