Revisions to How to sort the list of positional parameters in POSIX sh

added 8 characters in body

Source Link

edited Dec 3, 2023 at 9:26

584.8k
96
1.1k
1.7k

code=$(
  awk -v q="'" -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, ranksorted)
      printf "set --"
      for (i = 1; i <= n; i++) {
        s = ARGV[rank[i]]ARGV[sorted[i]]
        gsub(q, q "\\" q q, s)
        printf " %s", q s q
      }
    }' "$@"
) || exit
eval "$code"

code=$(
  awk -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, ranksorted)
      printf "set --"
      for (i = 1; i <= n; i++)
        printf " \"${%s}\"", rank[i]sorted[i]
    }' "$@"
) || exit
eval "$code"

code=$(
  awk -v q="'" -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, rank)
      printf "set --"
      for (i = 1; i <= n; i++) {
        s = ARGV[rank[i]]
        gsub(q, q "\\" q q, s)
        printf " %s", q s q
      }
    }' "$@"
) || exit
eval "$code"

code=$(
  awk -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, rank)
      printf "set --"
      for (i = 1; i <= n; i++)
        printf " \"${%s}\"", rank[i]
    }' "$@"
) || exit
eval "$code"

code=$(
  awk -v q="'" -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, sorted)
      printf "set --"
      for (i = 1; i <= n; i++) {
        s = ARGV[sorted[i]]
        gsub(q, q "\\" q q, s)
        printf " %s", q s q
      }
    }' "$@"
) || exit
eval "$code"

code=$(
  awk -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, sorted)
      printf "set --"
      for (i = 1; i <= n; i++)
        printf " \"${%s}\"", sorted[i]
    }' "$@"
) || exit
eval "$code"

added 1006 characters in body

Source Link

edited Dec 3, 2023 at 9:17

Stéphane Chazelas

584.8k
96
1.1k
1.7k

Then use (Edit: see further down for a better, safer approach):

Note that to be POSIX compliant, that code should have gsub(q, q "\\\\" q q, s) instead of gsub(q, q "\\" q q, s), however the latter, even though yielding unspecified behaviour by POSIX is more portable as the former doesn't work properly with gawk unless $POSIXLY_CORRECT is in the environment nor busybox awk for instance.

If the data is not guaranteed to be valid text in the user's locale, one can set the locale to C and then the strings will be considered as arrays of bytes and sorted as if by strcmp() (on ASCII-based systems), not as per the user's locale collation order.

Giving eval something that is potentially the result of unspecified behaviour is quite uncomfortable, but on second thought, it should be possible to make it reliable if instead of having awk output something like set -- '3rd argument hoped to be quoted correctly' 'first' 'second', we have it output set -- "${3}" "${1}" "${2}".

That should also be easier to do, shorter and more efficient:

code=$(
  awk -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, rank)
      printf "set --"
      for (i = 1; i <= n; i++)
        printf " \"${%s}\"", rank[i]
    }' "$@"
) || exit
eval "$code"

Then use:

Note that to be POSIX compliant, that code should have gsub(q, q "\\\\" q q, s) instead of gsub(q, q "\\" q q, s), however the latter, even though yielding unspecified behaviour by POSIX is more portable as the former doesn't work properly with gawk unless $POSIXLY_CORRECT is in the environment nor busybox awk for instance.

Then use (Edit: see further down for a better, safer approach):

Note that to be POSIX compliant, that code should have gsub(q, q "\\\\" q q, s) instead of gsub(q, q "\\" q q, s), however the latter, even though yielding unspecified behaviour by POSIX is more portable as the former doesn't work properly with gawk unless $POSIXLY_CORRECT is in the environment nor busybox awk for instance.

If the data is not guaranteed to be valid text in the user's locale, one can set the locale to C and then the strings will be considered as arrays of bytes and sorted as if by strcmp() (on ASCII-based systems), not as per the user's locale collation order.

Giving eval something that is potentially the result of unspecified behaviour is quite uncomfortable, but on second thought, it should be possible to make it reliable if instead of having awk output something like set -- '3rd argument hoped to be quoted correctly' 'first' 'second', we have it output set -- "${3}" "${1}" "${2}".

That should also be easier to do, shorter and more efficient:

code=$(
  awk -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, rank)
      printf "set --"
      for (i = 1; i <= n; i++)
        printf " \"${%s}\"", rank[i]
    }' "$@"
) || exit
eval "$code"

added 14 characters in body

Source Link

edited Nov 28, 2023 at 9:20

Stéphane Chazelas

584.8k
96
1.1k
1.7k

Probably easiest is to resort to awk which can do strcoll(), strcmp(), and number comparisons (including of floating points).

To avoid reinventing the wheel, we can use the quicksort awk implementation at https://rosettacode.org/wiki/Sorting_algorithms/Quicksort#AWK (GPLv2).

Then use:

code=$(
  awk -v q="'" -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, rank)
      printf "set --"
      for (i = 1; i <= n; i++) {
        s = ARGV[rank[i]]
        gsub(q, q "\\" q q, s)
        printf " %s", q s q
      }
    }' "$@"
) || exit
eval "$code"

That assumes the positional parameters contain valid text in the user's locale and that the list is small enough to fit in command line arguments (and in awk's array size limit).

That uses awk's < operator which will do number comparison if operands are recognised as numbers or strcoll() comparison otherwise. You can force strcoll() comparison by changing the comparisons from a < b to a"" < b"" (fix the locale to C for strcmp() comparison) and force number comparison by changing to a+0 < b+0 (+a < +b would also be POSIX but in practice not portable), or you can always write a custom compare() awk function to do whatever comparison you want.

Note that to be POSIX compliant, that code should have gsub(q, q "\\\\" q q, s) instead of gsub(q, q "\\" q q, s), however the latter, even though yielding unspecified behaviour by POSIX is more portable as the former doesn't work properly with gawk unless $POSIXLY_CORRECT is in the environment ornor busybox awk for instance.

Probably easiest is to resort to awk which can do strcoll(), strcmp(), and number comparisons (including of floating points).

To avoid reinventing the wheel, we can use the quicksort awk implementation at https://rosettacode.org/wiki/Sorting_algorithms/Quicksort#AWK (GPLv2).

Then use:

code=$(
  awk -v q="'" -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, rank)
      printf "set --"
      for (i = 1; i <= n; i++) {
        s = ARGV[rank[i]]
        gsub(q, q "\\" q q, s)
        printf " %s", q s q
      }
    }' "$@"
) || exit
eval "$code"

That assumes the positional parameters contain valid text in the user's locale and that the list is small enough to fit in command line arguments (and in awk's array size limit).

That uses awk's < operator which will do number comparison if operands are recognised as numbers or strcoll() comparison otherwise. You can force strcoll() comparison by changing the comparisons from a < b to a"" < b"" (fix the locale to C for strcmp() comparison) and force number comparison by changing to a+0 < b+0 (+a < +b would also be POSIX but in practice not portable), or you can always write a custom compare() awk function to do whatever comparison you want.

Note that to be POSIX compliant, that code should have gsub(q, q "\\\\" q q, s) instead of gsub(q, q "\\" q q, s), however the latter, even though yielding unspecified behaviour by POSIX is more portable as the former doesn't work properly with gawk unless $POSIXLY_CORRECT is in the environment or busybox awk.

Probably easiest is to resort to awk which can do strcoll(), strcmp(), and number comparisons (including of floating points).

To avoid reinventing the wheel, we can use the quicksort awk implementation at https://rosettacode.org/wiki/Sorting_algorithms/Quicksort#AWK (GPLv2).

Then use:

code=$(
  awk -v q="'" -- '
    code-from-that-link-above
    BEGIN {
      delete ARGV[0]
      n = qsortArbIndByValue(ARGV, rank)
      printf "set --"
      for (i = 1; i <= n; i++) {
        s = ARGV[rank[i]]
        gsub(q, q "\\" q q, s)
        printf " %s", q s q
      }
    }' "$@"
) || exit
eval "$code"

That assumes the positional parameters contain valid text in the user's locale and that the list is small enough to fit in command line arguments (and in awk's array size limit).

That uses awk's < operator which will do number comparison if operands are recognised as numbers or strcoll() comparison otherwise. You can force strcoll() comparison by changing the comparisons from a < b to a"" < b"" (fix the locale to C for strcmp() comparison) and force number comparison by changing to a+0 < b+0 (+a < +b would also be POSIX but in practice not portable), or you can always write a custom compare() awk function to do whatever comparison you want.

Note that to be POSIX compliant, that code should have gsub(q, q "\\\\" q q, s) instead of gsub(q, q "\\" q q, s), however the latter, even though yielding unspecified behaviour by POSIX is more portable as the former doesn't work properly with gawk unless $POSIXLY_CORRECT is in the environment nor busybox awk for instance.

Using `\\` rather than `\\\\` is more portable even if not POSIX. `\\\\` doesn't work with busybox awk or `gawk` unless POSIXLY_CORRECT is in the environment.

Source Link

edited Nov 28, 2023 at 9:15

Stéphane Chazelas

584.8k
96
1.1k
1.7k

Loading

added 19 characters in body

Source Link

edited Nov 19, 2023 at 20:30

Stéphane Chazelas

584.8k
96
1.1k
1.7k

Loading

Source Link

answered Nov 19, 2023 at 18:23

Stéphane Chazelas

584.8k
96
1.1k
1.7k

Loading

Stack Exchange Network

Return to Answer