Revisions to Sorting numerically a list

added 5 characters in body

Source Link

edited Mar 22, 2021 at 8:38

14.1k
2
36
55

Via awk and the GNU-feature (!) of defining array traversal. Note: stores the whole file in RAM once, but you said "over 100 volumes" so I assume the file is not incredibly large.

The idea is

separate records by empty lines (two newlines in a row, no TAB assumed)
use parentheses as field separators: get lines into array with volume number as index identifier. Therefore the number needs to be separated out with sub
sort output by "volume X" index
simply replace the numbers (293G etc) for each entry in a sorted manner

Script:

BEGIN { RS=ORS="\n\n"RS="" ; ORS="\n\n" ; FS="[()]" }

{id=$2 ; sub(/volume /,"",id) ; vol[id]=$0}    

END {PROCINFO["sorted_in"]="@ind_num_asc"
    n=293n=292
    for ( id in vol ) { gsub(/^\t.../,"\t"n++,vol[id]) ; print vol[id] } }

Run via

awk -f script inputfile

Via awk and the GNU-feature (!) of defining array traversal. Note: stores the whole file in RAM once, but you said "over 100 volumes" so I assume the file is not incredibly large.

The idea is

separate records by empty lines (two newlines in a row, no TAB assumed)
use parentheses as field separators: get lines into array with volume number as index identifier. Therefore the number needs to be separated out with sub
sort output by "volume X" index
simply replace the numbers (293G etc) for each entry in a sorted manner

Script:

BEGIN { RS=ORS="\n\n" ; FS="[()]" }

{id=$2 ; sub(/volume /,"",id) ; vol[id]=$0}    

END {PROCINFO["sorted_in"]="@ind_num_asc"
    n=293
    for ( id in vol ) { gsub(/^\t.../,"\t"n++,vol[id]) ; print vol[id] } }

Run via

awk -f script inputfile

Via awk and the GNU-feature (!) of defining array traversal. Note: stores the whole file in RAM once, but you said "over 100 volumes" so I assume the file is not incredibly large.

The idea is

separate records by empty lines (two newlines in a row, no TAB assumed)
use parentheses as field separators: get lines into array with volume number as index identifier. Therefore the number needs to be separated out with sub
sort output by "volume X" index
simply replace the numbers (293G etc) for each entry in a sorted manner

Script:

BEGIN { RS="" ; ORS="\n\n" ; FS="[()]" }

{id=$2 ; sub(/volume /,"",id) ; vol[id]=$0}    

END {PROCINFO["sorted_in"]="@ind_num_asc"
    n=292
    for ( id in vol ) { gsub(/^\t.../,"\t"n++,vol[id]) ; print vol[id] } }

Run via

awk -f script inputfile

Post Undeleted by FelixJN

occurred Mar 21, 2021 at 11:12

added 36 characters in body

Source Link

edited Mar 21, 2021 at 11:12

FelixJN

14.1k
2
36
55

Via awk and the GNU-feature (!) of defining array traversal. Note: stores the whole file in RAM once, but you said "over 100 volumes" so I assume the file is not incredibly large.

The idea is

separate records by empty lines (two newlines in a row, no TAB assumed)
use parentheses as field separators: get lines into array with "volume X"volume number as index identifier. Therefore the number needs to be separated out with sub
sort output by "volume X" index
simply replace the numbers (293G etc) for each entry in a sorted manner

Script:

BEGIN { RS=ORS="\n\n" ; FS="[()]" }

{vol[$2]=$0id=$2 ; sub(/volume /,"",id) ; vol[id]=$0}    

END {PROCINFO["sorted_in"]="@ind_num_asc"
    j=293n=293
    for ( iid in vol ) { gsub(/^\t.../,"\t"j++"\t"n++,vol[i]vol[id]) ; print vol[i]vol[id] } }

Run via

awk -f script inputfile

Via awk and the GNU-feature (!) of defining array traversal. Note: stores the whole file in RAM once, but you said "over 100 volumes" so I assume the file is not incredibly large.

The idea is

separate records by empty lines (two newlines in a row, no TAB assumed)
use parentheses as field separators: get lines into array with "volume X" as index identifier
sort output by "volume X" index
simply replace the numbers (293G etc) for each entry in a sorted manner

Script:

BEGIN { RS=ORS="\n\n" ; FS="[()]" }

{vol[$2]=$0}

END {PROCINFO["sorted_in"]="@ind_num_asc"
    j=293
    for ( i in vol ) { gsub(/^\t.../,"\t"j++,vol[i]) ; print vol[i] } }

Run via

awk -f script inputfile

Via awk and the GNU-feature (!) of defining array traversal. Note: stores the whole file in RAM once, but you said "over 100 volumes" so I assume the file is not incredibly large.

The idea is

separate records by empty lines (two newlines in a row, no TAB assumed)
use parentheses as field separators: get lines into array with volume number as index identifier. Therefore the number needs to be separated out with sub
sort output by "volume X" index
simply replace the numbers (293G etc) for each entry in a sorted manner

Script:

BEGIN { RS=ORS="\n\n" ; FS="[()]" }

{id=$2 ; sub(/volume /,"",id) ; vol[id]=$0}    

END {PROCINFO["sorted_in"]="@ind_num_asc"
    n=293
    for ( id in vol ) { gsub(/^\t.../,"\t"n++,vol[id]) ; print vol[id] } }

Run via

awk -f script inputfile

Post Deleted by FelixJN

occurred Mar 21, 2021 at 11:04

Source Link

answered Mar 21, 2021 at 10:51

FelixJN

14.1k
2
36
55

Via awk and the GNU-feature (!) of defining array traversal. Note: stores the whole file in RAM once, but you said "over 100 volumes" so I assume the file is not incredibly large.

The idea is

separate records by empty lines (two newlines in a row, no TAB assumed)
use parentheses as field separators: get lines into array with "volume X" as index identifier
sort output by "volume X" index
simply replace the numbers (293G etc) for each entry in a sorted manner

Script:

BEGIN { RS=ORS="\n\n" ; FS="[()]" }

{vol[$2]=$0}

END {PROCINFO["sorted_in"]="@ind_num_asc"
    j=293
    for ( i in vol ) { gsub(/^\t.../,"\t"j++,vol[i]) ; print vol[i] } }

Run via

awk -f script inputfile

Stack Exchange Network

Return to Answer