Revisions to find/detect file name goups

added 253 characters in body

Source Link

edited Nov 5, 2021 at 13:41

584.6k
96
1.1k
1.7k

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )

typeset -U groups=(...): define groups as an array with Unique members
**/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
(Ne['code']): glob qualifiers to further qualify the glob
N: Nullglob: expand to nothing if there's no match
e['code'] transform each glob expansion¹ (in $REPLY in the code)
$REPLY:t: the tail (basename) of the file.
${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.

If there's a large list of files, the find-based one will be more memory friendly as it doesn't store the whole list in memory. You can take a similar approach in zsh with:

typeset -A seen=()
: **/*_*_*.*(Ne['! seen[${${(s[_])REPLY:t}[2]}]='])
groups=( ${(k)seen} )

^{¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true}

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )

typeset -U groups=(...): define groups as an array with Unique members
**/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
(Ne['code']): glob qualifiers to further qualify the glob
N: Nullglob: expand to nothing if there's no match
e['code'] transform each glob expansion¹ (in $REPLY in the code)
$REPLY:t: the tail (basename) of the file.
${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.

^{¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true}

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )

typeset -U groups=(...): define groups as an array with Unique members
**/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
(Ne['code']): glob qualifiers to further qualify the glob
N: Nullglob: expand to nothing if there's no match
e['code'] transform each glob expansion¹ (in $REPLY in the code)
$REPLY:t: the tail (basename) of the file.
${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.

If there's a large list of files, the find-based one will be more memory friendly as it doesn't store the whole list in memory. You can take a similar approach in zsh with:

typeset -A seen=()
: **/*_*_*.*(Ne['! seen[${${(s[_])REPLY:t}[2]}]='])
groups=( ${(k)seen} )

^{¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true}

added 253 characters in body

Source Link

edited Nov 5, 2021 at 13:34

Stéphane Chazelas

584.6k
96
1.1k
1.7k

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )

typeset -U groups=(...): define groups as an array with Unique members
**/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
(Ne['code']): glob qualifiers to further qualify the glob
N: Nullglob: expand to nothing if there's no match
e['code'] transform each glob expansion¹ (in $REPLY in the code)
$REPLY:t: the tail (basename) of the file.
${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.

^{¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true}

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )

typeset -U groups=(...): define groups as an array with Unique members
**/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
(Ne['code']): glob qualifiers to further qualify the glob
N: Nullglob: expand to nothing if there's no match
e['code'] transform each glob expansion¹ (in $REPLY in the code)
$REPLY:t: the tail (basename) of the file.
${(s[_])var}: splits on _ (and then we take the second with [2]).

^{¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true}

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )

typeset -U groups=(...): define groups as an array with Unique members
**/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
(Ne['code']): glob qualifiers to further qualify the glob
N: Nullglob: expand to nothing if there's no match
e['code'] transform each glob expansion¹ (in $REPLY in the code)
$REPLY:t: the tail (basename) of the file.
${(s[_])var}: splits on _ (and then we take the second with [2]).

With bash (the GNU shell), GNU find and GNU awk, you can do something similar with:

readarray -td '' groups < <(
  LC_ALL=C find . -name '.?*' -prune -o \
    -name '*_*_*.*' -printf '%f\0' |
    gawk -v RS='\0' -v ORS='\0' -F _ '!seen[$2]++ {print $2}'
)

Those make no assumption as to what characters or non-characters may be found between those first two _ characters.

Both skip hidden files and files in hidden directories. To include them, add the D glob qualifier in zsh or remove the -name '.?*' -prune -o in find.

^{¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true}

Source Link

answered Nov 5, 2021 at 13:26

Stéphane Chazelas

584.6k
96
1.1k
1.7k

With zsh:

typeset -U groups=( **/*_*_*.*(Ne['REPLY=${${(s[_])REPLY:t}[2]}']) )

typeset -U groups=(...): define groups as an array with Unique members
**/*_*_*.*: file names with at least one . and at least two _s before the rightmost ., at or below the current working directory
(Ne['code']): glob qualifiers to further qualify the glob
N: Nullglob: expand to nothing if there's no match
e['code'] transform each glob expansion¹ (in $REPLY in the code)
$REPLY:t: the tail (basename) of the file.
${(s[_])var}: splits on _ (and then we take the second with [2]).

^{¹ the exit status of that code also determines whether the file is selected or not, but here the code always returns true}

Stack Exchange Network

Return to Answer