My goal is to parse indented text in the style of python and YAML.
This only find the parent of each line.
This bit of code seems to do the trick, but I'm not really satisfied and I wanted to know if you would do this another way.
raw = """animal
carnivorous
tiger
lion
vegetarian
cow
sheep
plant
algea
tree
leaf
pine
fungus
good
bad
evil
mean
cactus
big
small"""
lines = raw.split('\n')
indents = [(0,0,'root')]
for a in raw.split('\n'):
indent = 0
while a[indent] == ' ': indent+=1
if indent % 4:
print("not multiple of 4")
break
indents.append((len(indents), int(indent/4)+1,a.replace(' ','')))
for a in indents: print(a)
stack=[indents[0]]
entries =[indents[0]]
prev_indent = 0
for item in indents[1:]:
print("#########################")
id, indent, name = item
diff = indent - prev_indent
print(item)
print("diff",diff, [a[2] for a in stack])
if diff>0:
entries.append(item+(stack[-1][2],))
elif diff<0:
# entries.append(item+(stack[-diff][2],))
count = -diff
while count>-1: stack.pop();count-=1
entries.append(item+(stack[-1][2],))
elif diff==0:
stack.pop()
entries.append(item+(stack[-1][2],))
stack.append(item)
prev_indent = entries[-1][1]
print("result", entries[-1])
print("########################")
for a in entries:
if len (a) == 3: continue
ident, level, name, parent = a
print(level*' '*4, name, '(', parent, ')')
This results in this (the name in parenthesis is the parent):
animal ( root )
carnivorous ( animal )
tiger ( carnivorous )
lion ( carnivorous )
vegetarian ( animal )
cow ( vegetarian )
sheep ( vegetarian )
plant ( root )
algea ( plant )
tree ( plant )
leaf ( tree )
pine ( tree )
fungus ( plant )
good ( fungus )
bad ( fungus )
evil ( bad )
mean ( bad )
cactus ( plant )
big ( cactus )
small ( cactus )
re.compile(r'^(?P<indent>(?: {4})*)(?P<name>\S.*)')you suggest to get rid of empty lines too? I am using it to read a text file and this file has empty lines causing error in the code you sugest. \$\endgroup\$