Sort a file while grouping indented lines with their parent (multiple level)

Question

All the levels should be sorted alphabetically (but must be kept with their parent)

File Example:

first
    apple
    orange
        train
        car
    kiwi
third
    orange
    apple
        plane
second
    lemon

Expected Result:

first
    apple
    kiwi
    orange
        car
        train
second
    lemon
third
    apple
        plane
    orange

The following command has been used but it works only if the file has only two levels into the tree.

sed '/^[^[:blank:]]/h;//!G;s/\(.*\)\n\(.*\)/\2\x02\1/' infile | sort | sed 's/.*\x02//'

How can I do to sort all the levels correctly?

Thanks in advance

please format your input content in proper way (as it actually looks). Copy and paste, then use {} (code sample) on selected fragment — RomanPerekhrest
– RomanPerekhrest, Commented Jul 3, 2018 at 17:04
are there spaces before first, second (1st level) values? — RomanPerekhrest
– RomanPerekhrest, Commented Jul 3, 2018 at 17:33

RomanPerekhrest · Accepted Answer · 2018-07-03 22:00:22Z

Extended Python solution:

Sample infile contents (4 levels):

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

sort_hierarchy.py script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import re

with open(sys.argv[1], 'rt') as f:
    pat = re.compile(r'^\s+')
    paths = []

    for line in f:
        offset = pat.match(line)
        item = line.strip()

        if not offset:
            offset = 0
            paths.append(item)
        else:
            offset = offset.span()[1]
            if offset > prev_offset:
                paths.append(paths[-1] + '.' + item)
            else:
                cut_pos = -prev_offset//offset
                paths.append('.'.join(paths[-1].split('.')[:cut_pos]) + '.' + item)

        prev_offset = offset

    paths.sort()
    sub_pat = re.compile(r'[^.]+\.')
    for i in paths:
        print(sub_pat.sub(' ' * 4, i))

Usage:

python sort_hierarchy.py path/to/infile

The output:

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange

@nick10, learn about What should I do when someone answers my question? — RomanPerekhrest
– RomanPerekhrest, Commented Jul 4, 2018 at 10:50

RomanPerekhrest · Accepted Answer · 2018-07-03 22:06:15Z

Awk solution:

Sample infile contents (4 levels):

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

awk '{
         offset = gsub(/ /, "");
         if (offset == 0) { items[NR] = $1 }
         else if (offset > prev_ofst) { items[NR] = items[NR-1] "." $1 }
         else {
             prev_item = items[NR-1];
             gsub("(\\.[^.]+){" int(prev_ofst / offset) "}$", "", prev_item);
             items[NR] = prev_item "." $1
         }
         prev_ofst = offset;
     }
     END{
         asort(items);
         for (i = 1; i <= NR; i++) {
             gsub(/[^.]+\./, "    ", items[i]);
             print items[i]
         }
     }' infile

The output:

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange

iruvar · Accepted Answer · 2018-07-04 15:36:45Z

works for any depth

#!/usr/bin/python3
lines = open('test_file').read().splitlines()

def yield_sorted_lines(lines):
        sorter = []
        for l in lines:
                fields = l.split('\t')
                n = len(fields)
                sorter = sorter[:n-1] + fields[n-1:]
                yield sorter, l


prefixed_lines = yield_sorted_lines(lines)
sorted_lines = sorted(prefixed_lines, key=lambda x: x[0])
for x, y in sorted_lines:
        print(y)

Or an pipeline

awk -F'\\t' '{a[NF]=$NF; for (i=1; i<=NF; ++i) printf "%s%s", a[i], i==NF? "\n": "\t"}' file|
sort | awk -F'\\t' -vOFS='\t' '{for (i=1; i<NF; ++i) $i=""; print}'

jthill · Accepted Answer · 2018-07-04 19:40:17Z

0

sed '/^ /{H;$!d};x;1d;s/\n/\x7/g' | sort | tr \\a \\n

The /continuation/{H;$!d};x;1d (or /firstline/!etc) is a slurp, it falls through only when it's got a complete line gaggle in the buffer.

If you might get a single-line gaggle at the end, add ${p;x;/\n/d} to do the double-pump needed for that.

answered Jul 4, 2018 at 19:40

jthill

2,75014 silver badges16 bronze badges

Add a comment |

Stack Exchange Network

Sort a file while grouping indented lines with their parent (multiple level)

4 Answers 4

You must log in to answer this question.

Hot Network Questions

Sort a file while grouping indented lines with their parent (multiple level)

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions