Skip to main content
added 253 characters in body
Source Link
Stéphane Chazelas
  • 584.6k
  • 96
  • 1.1k
  • 1.7k

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )
  • typeset -U groups=(...): define groups as an array with Unique members
  • **/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
  • (Ne['code']): glob qualifiers to further qualify the glob
  • N: Nullglob: expand to nothing if there's no match
  • e['code'] transform each glob expansion¹ (in $REPLY in the code)
  • $REPLY:t: the tail (basename) of the file.
  • ${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.

If there's a large list of files, the find-based one will be more memory friendly as it doesn't store the whole list in memory. You can take a similar approach in zsh with:

typeset -A seen=()
: **/*_*_*.*(Ne['! seen[${${(s[_])REPLY:t}[2]}]='])
groups=( ${(k)seen} )

¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )
  • typeset -U groups=(...): define groups as an array with Unique members
  • **/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
  • (Ne['code']): glob qualifiers to further qualify the glob
  • N: Nullglob: expand to nothing if there's no match
  • e['code'] transform each glob expansion¹ (in $REPLY in the code)
  • $REPLY:t: the tail (basename) of the file.
  • ${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.


¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )
  • typeset -U groups=(...): define groups as an array with Unique members
  • **/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
  • (Ne['code']): glob qualifiers to further qualify the glob
  • N: Nullglob: expand to nothing if there's no match
  • e['code'] transform each glob expansion¹ (in $REPLY in the code)
  • $REPLY:t: the tail (basename) of the file.
  • ${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.

If there's a large list of files, the find-based one will be more memory friendly as it doesn't store the whole list in memory. You can take a similar approach in zsh with:

typeset -A seen=()
: **/*_*_*.*(Ne['! seen[${${(s[_])REPLY:t}[2]}]='])
groups=( ${(k)seen} )

¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true

added 253 characters in body
Source Link
Stéphane Chazelas
  • 584.6k
  • 96
  • 1.1k
  • 1.7k

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )
  • typeset -U groups=(...): define groups as an array with Unique members
  • **/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
  • (Ne['code']): glob qualifiers to further qualify the glob
  • N: Nullglob: expand to nothing if there's no match
  • e['code'] transform each glob expansion¹ (in $REPLY in the code)
  • $REPLY:t: the tail (basename) of the file.
  • ${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.


¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )
  • typeset -U groups=(...): define groups as an array with Unique members
  • **/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
  • (Ne['code']): glob qualifiers to further qualify the glob
  • N: Nullglob: expand to nothing if there's no match
  • e['code'] transform each glob expansion¹ (in $REPLY in the code)
  • $REPLY:t: the tail (basename) of the file.
  • ${(s[_])var}: splits on _ (and then we take the second with [2]).

¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )
  • typeset -U groups=(...): define groups as an array with Unique members
  • **/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
  • (Ne['code']): glob qualifiers to further qualify the glob
  • N: Nullglob: expand to nothing if there's no match
  • e['code'] transform each glob expansion¹ (in $REPLY in the code)
  • $REPLY:t: the tail (basename) of the file.
  • ${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.


¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true

Source Link
Stéphane Chazelas
  • 584.6k
  • 96
  • 1.1k
  • 1.7k

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )
  • typeset -U groups=(...): define groups as an array with Unique members
  • **/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
  • (Ne['code']): glob qualifiers to further qualify the glob
  • N: Nullglob: expand to nothing if there's no match
  • e['code'] transform each glob expansion¹ (in $REPLY in the code)
  • $REPLY:t: the tail (basename) of the file.
  • ${(s[_])var}: splits on _ (and then we take the second with [2]).

¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true