How to sort groups of lines?

Question

In the following example, there are 3 elements that have to be sorted:

"[aaa]" and the 4 lines (always 4) below it form a single unit.
"[kkk]" and the 4 lines (always 4) below it form a single unit.
"[zzz]" and the 4 lines (always 4) below it form a single unit.

Only groups of lines following this pattern should be sorted; anything before "[aaa]" and after the 4th line of "[zzz]" must be left intact.

from:

This sentence and everything above it should not be sorted.

[zzz]
some
random
text
here
[aaa]
bla
blo
blu
bli
[kkk]
1
44
2
88

And neither should this one and everything below it.

to:

This sentence and everything above it should not be sorted.

[aaa]
bla
blo
blu
bli
[kkk]
1
44
2
88
[zzz]
some
random
text
here

And neither should this one and everything below it.

what a horrible example.. why do you have identical values for all the sections? note: I would use awk. — Karoly Horvath
– Karoly Horvath, Commented Nov 23, 2012 at 0:58
yes, before and after illustration of data are the way to go. Good luck. — shellter
– shellter, Commented Nov 23, 2012 at 3:30

rici · Accepted Answer · 2012-11-23 05:07:10Z

Maybe not the fastest :) [1] but it will do what you want, I believe:

for line in $(grep -n '^\[.*\]$' sections.txt |
              sort -k2 -t: |
              cut -f1 -d:); do
  tail -n +$line sections.txt | head -n 5
done

Here's a better one:

for pos in $(grep -b '^\[.*\]$' sections.txt |
             sort -k2 -t: |
             cut -f1 -d:); do
  tail -c +$((pos+1)) sections.txt | head -n 5
done

[1] The first one is something like O(N^2) in the number of lines in the file, since it has to read all the way to the section for each section. The second one, which can seek immediately to the right character position, should be closer to O(N log N).

[2] This takes you at your word that there are always exactly five lines in each section (header plus four following), hence head -n 5. However, it would be really easy to replace that with something which read up to but not including the next line starting with a '[', in case that ever turns out to be necessary.

Preserving start and end requires a bit more work:

# Find all the sections
mapfile indices < <(grep -b '^\[.*\]$' sections.txt)
# Output the prefix
head -c+${indices[0]%%:*} sections.txt
# Output sections, as above
for pos in $(printf %s "${indices[@]}" |
             sort -k2 -t: |
             cut -f1 -d:); do
  tail -c +$((pos+1)) sections.txt | head -n 5
done
# Output the suffix
tail -c+$((1+${indices[-1]%%:*})) sections.txt | tail -n+6

You might want to make a function out of that, or a script file, changing sections.txt to $1 throughout.

It does the sorting, but wrongly discards the header and footer. They should not be affected by the procedure.
One way to keep the header and footer (another way would be to strip them first and add them back at the end.)

anishsane · Accepted Answer · 2012-11-23 08:10:58Z

Assuming that other lines do not contain a [ in them:

header=`grep -n 'This sentence and everything above it should not be sorted.' sortme.txt | cut -d: -f1`
footer=`grep -n 'And neither should this one and everything below it.' sortme.txt | cut -d: -f1`

head -n $header sortme.txt #print header

head -n $(( footer - 1 )) sortme.txt | tail -n +$(( header + 1 )) | tr '\n[' '[\n' | sort | tr '\n[' '[\n' | grep -v '^\[$' #sort lines between header & footer
#cat sortme.txt | head -n $(( footer - 1 )) | tail -n +$(( header + 1 )) | tr '\n[' '[\n' | sort | tr '\n[' '[\n' | grep -v '^\[$' #sort lines between header & footer

tail -n +$footer sortme.txt #print footer

Serves the purpose.

Note that the main sort work is done by 4th command only. Other lines are to reserve header & footer.

I am also assuming that, between header & first "[section]" there are no other lines.

potong · Accepted Answer · 2012-11-23 09:54:15Z

0

This might work for you (GNU sed & sort):

sed -i.bak '/^\[/!b;N;N;N;N;s/\n/UnIqUeStRiNg/g;w sort_file' file
sort -o sort_file sort_file
sed -i -e '/^\[/!b;R sort_file' -e 'd' file
sed -i 's/UnIqUeStRiNg/\n/g' file

Sorted file will be in file and original file in file.bak.

This will present all lines beginning with [ and following 4 lines, in sorted order.

UnIqUeStRiNg can be any unique string not containing a newline, e.g. \x00

answered Nov 23, 2012 at 9:54

potong

59.3k6 gold badges55 silver badges92 bronze badges

2 Comments

anishsane Over a year ago

You forgot about header & footer... This sentence and everything above it should not be sorted.

potong Over a year ago

@anishsane from the example data you have provided the header and footer is not sorted. However if these parts of the file may include lines that begin [...] then the sed commands can be more specific i.e. /^\[$aaa\|\kkk\|zzz$\]/!b

Collectives™ on Stack Overflow

How to sort groups of lines?

3 Answers 3

2 Comments

2 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

2 Comments

Linked

Related