3

All the levels should be sorted alphabetically (but must be kept with their parent)

File Example:

first
    apple
    orange
        train
        car
    kiwi
third
    orange
    apple
        plane
second
    lemon

Expected Result:

first
    apple
    kiwi
    orange
        car
        train
second
    lemon
third
    apple
        plane
    orange

The following command has been used but it works only if the file has only two levels into the tree.

sed '/^[^[:blank:]]/h;//!G;s/\(.*\)\n\(.*\)/\2\x02\1/' infile | sort | sed 's/.*\x02//'

How can I do to sort all the levels correctly?

Thanks in advance

6
  • 2
    please format your input content in proper way (as it actually looks). Copy and paste, then use {} (code sample) on selected fragment Commented Jul 3, 2018 at 17:04
  • could the file have more than 3 levels? Commented Jul 3, 2018 at 17:23
  • 4 levels are possible Commented Jul 3, 2018 at 17:31
  • are there spaces before first, second (1st level) values? Commented Jul 3, 2018 at 17:33
  • No Spaces before the first level values Commented Jul 3, 2018 at 17:35

4 Answers 4

1

Extended Python solution:

Sample infile contents (4 levels):

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

sort_hierarchy.py script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import re

with open(sys.argv[1], 'rt') as f:
    pat = re.compile(r'^\s+')
    paths = []

    for line in f:
        offset = pat.match(line)
        item = line.strip()

        if not offset:
            offset = 0
            paths.append(item)
        else:
            offset = offset.span()[1]
            if offset > prev_offset:
                paths.append(paths[-1] + '.' + item)
            else:
                cut_pos = -prev_offset//offset
                paths.append('.'.join(paths[-1].split('.')[:cut_pos]) + '.' + item)

        prev_offset = offset

    paths.sort()
    sub_pat = re.compile(r'[^.]+\.')
    for i in paths:
        print(sub_pat.sub(' ' * 4, i))

Usage:

python sort_hierarchy.py path/to/infile

The output:

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange
1
0

Awk solution:

Sample infile contents (4 levels):

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

awk '{
         offset = gsub(/ /, "");
         if (offset == 0) { items[NR] = $1 }
         else if (offset > prev_ofst) { items[NR] = items[NR-1] "." $1 }
         else {
             prev_item = items[NR-1];
             gsub("(\\.[^.]+){" int(prev_ofst / offset) "}$", "", prev_item);
             items[NR] = prev_item "." $1
         }
         prev_ofst = offset;
     }
     END{
         asort(items);
         for (i = 1; i <= NR; i++) {
             gsub(/[^.]+\./, "    ", items[i]);
             print items[i]
         }
     }' infile

The output:

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange
0

works for any depth

#!/usr/bin/python3
lines = open('test_file').read().splitlines()

def yield_sorted_lines(lines):
        sorter = []
        for l in lines:
                fields = l.split('\t')
                n = len(fields)
                sorter = sorter[:n-1] + fields[n-1:]
                yield sorter, l


prefixed_lines = yield_sorted_lines(lines)
sorted_lines = sorted(prefixed_lines, key=lambda x: x[0])
for x, y in sorted_lines:
        print(y)

Or an pipeline

awk -F'\\t' '{a[NF]=$NF; for (i=1; i<=NF; ++i) printf "%s%s", a[i], i==NF? "\n": "\t"}' file|
sort | awk -F'\\t' -vOFS='\t' '{for (i=1; i<NF; ++i) $i=""; print}'
0
sed '/^ /{H;$!d};x;1d;s/\n/\x7/g' | sort | tr \\a \\n

The /continuation/{H;$!d};x;1d (or /firstline/!etc) is a slurp, it falls through only when it's got a complete line gaggle in the buffer.

If you might get a single-line gaggle at the end, add ${p;x;/\n/d} to do the double-pump needed for that.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.