3

In the following example, there are 3 elements that have to be sorted:

  1. "[aaa]" and the 4 lines (always 4) below it form a single unit.
  2. "[kkk]" and the 4 lines (always 4) below it form a single unit.
  3. "[zzz]" and the 4 lines (always 4) below it form a single unit.

Only groups of lines following this pattern should be sorted; anything before "[aaa]" and after the 4th line of "[zzz]" must be left intact.

from:

This sentence and everything above it should not be sorted.

[zzz]
some
random
text
here
[aaa]
bla
blo
blu
bli
[kkk]
1
44
2
88

And neither should this one and everything below it.

to:

This sentence and everything above it should not be sorted.

[aaa]
bla
blo
blu
bli
[kkk]
1
44
2
88
[zzz]
some
random
text
here

And neither should this one and everything below it.
3
  • what a horrible example.. why do you have identical values for all the sections? note: I would use awk. Commented Nov 23, 2012 at 0:58
  • EDIT: it should easier to understand now. Commented Nov 23, 2012 at 1:05
  • yes, before and after illustration of data are the way to go. Good luck. Commented Nov 23, 2012 at 3:30

3 Answers 3

1

Maybe not the fastest :) [1] but it will do what you want, I believe:

for line in $(grep -n '^\[.*\]$' sections.txt |
              sort -k2 -t: |
              cut -f1 -d:); do
  tail -n +$line sections.txt | head -n 5
done

Here's a better one:

for pos in $(grep -b '^\[.*\]$' sections.txt |
             sort -k2 -t: |
             cut -f1 -d:); do
  tail -c +$((pos+1)) sections.txt | head -n 5
done

[1] The first one is something like O(N^2) in the number of lines in the file, since it has to read all the way to the section for each section. The second one, which can seek immediately to the right character position, should be closer to O(N log N).

[2] This takes you at your word that there are always exactly five lines in each section (header plus four following), hence head -n 5. However, it would be really easy to replace that with something which read up to but not including the next line starting with a '[', in case that ever turns out to be necessary.


Preserving start and end requires a bit more work:

# Find all the sections
mapfile indices < <(grep -b '^\[.*\]$' sections.txt)
# Output the prefix
head -c+${indices[0]%%:*} sections.txt
# Output sections, as above
for pos in $(printf %s "${indices[@]}" |
             sort -k2 -t: |
             cut -f1 -d:); do
  tail -c +$((pos+1)) sections.txt | head -n 5
done
# Output the suffix
tail -c+$((1+${indices[-1]%%:*})) sections.txt | tail -n+6

You might want to make a function out of that, or a script file, changing sections.txt to $1 throughout.

Sign up to request clarification or add additional context in comments.

2 Comments

It does the sorting, but wrongly discards the header and footer. They should not be affected by the procedure.
One way to keep the header and footer (another way would be to strip them first and add them back at the end.)
1

Assuming that other lines do not contain a [ in them:

header=`grep -n 'This sentence and everything above it should not be sorted.' sortme.txt | cut -d: -f1`
footer=`grep -n 'And neither should this one and everything below it.' sortme.txt | cut -d: -f1`

head -n $header sortme.txt #print header

head -n $(( footer - 1 )) sortme.txt | tail -n +$(( header + 1 )) | tr '\n[' '[\n' | sort | tr '\n[' '[\n' | grep -v '^\[$' #sort lines between header & footer
#cat sortme.txt | head -n $(( footer - 1 )) | tail -n +$(( header + 1 )) | tr '\n[' '[\n' | sort | tr '\n[' '[\n' | grep -v '^\[$' #sort lines between header & footer

tail -n +$footer sortme.txt #print footer

Serves the purpose.

Note that the main sort work is done by 4th command only. Other lines are to reserve header & footer.

I am also assuming that, between header & first "[section]" there are no other lines.

2 Comments

Useless use of cat spotted!
^^Sorry, that was while I was testing. Forgot to change it.
0

This might work for you (GNU sed & sort):

sed -i.bak '/^\[/!b;N;N;N;N;s/\n/UnIqUeStRiNg/g;w sort_file' file
sort -o sort_file sort_file
sed -i -e '/^\[/!b;R sort_file' -e 'd' file
sed -i 's/UnIqUeStRiNg/\n/g' file

Sorted file will be in file and original file in file.bak.

This will present all lines beginning with [ and following 4 lines, in sorted order.

UnIqUeStRiNg can be any unique string not containing a newline, e.g. \x00

2 Comments

You forgot about header & footer... This sentence and everything above it should not be sorted.
@anishsane from the example data you have provided the header and footer is not sorted. However if these parts of the file may include lines that begin [...] then the sed commands can be more specific i.e. /^\[\(aaa\|\kkk\|zzz\)\]/!b

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.