3

In Linux in a folder, without subfolders, there are many files like this scheme.

I list them with ls -1.

1yBWVnZCx8CoPrGIG.part01.rar
1yBWVnZCx8CoPrGIG.part02.rar
1yBWVnZCx8CoPrGIG.part03.rar
1yBWVnZCx8CoPrGIG.part04.rar
1yBWVnZCx8CoPrGIG.part05.rar
1yBWVnZCx8CoPrGIG.part06.rar
1yBWVnZCx8CoPrGIG.part07.rar
1yBWVnZCx8CoPrGIG.part08.rar
1yBWVnZCx8CoPrGIG.part09.rar
1yBWVnZCx8CoPrGIG.part10.rar
1yBWVnZCx8CoPrGIG.part11.rar
1yBWVnZCx8CoPrGIG.part12.rar
1yBWVnZCx8CoPrGIG.part13.rar
1yBWVnZCx8CoPrGIG.part14.rar
1yBWVnZCx8CoPrGIG.part15.rar
1yBWVnZCx8CoPrGIG.part16.rar
1yBWVnZCx8CoPrGIG.part17.rar
1yBWVnZCx8CoPrGIG.part18.rar
1yBWVnZCx8CoPrGIG.part19.rar
1yBWVnZCx8CoPrGIG.part20.rar
1yBWVnZCx8CoPrGIG.part21.rar
1yBWVnZCx8CoPrGIG.part22.rar
1yBWVnZCx8CoPrGIG.part23.rar
DaHs0QJnJbt.part1.rar
DaHs0QJnJbt.part2.rar
DaHs0QJnJbt.part3.rar
DaHs0QJnJbt.part4.rar
n5oTzoLvG.part1.rar
n5oTzoLvG.part2.rar
n5oTzoLvG.part3.rar
n5oTzoLvG.part4.rar
n5oTzoLvG.part5.rar
n5oTzoLvG.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part1.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
RSmWMPb0vWr8LIEFtR7o.part1.rar
RSmWMPb0vWr8LIEFtR7o.part2.rar
RSmWMPb0vWr8LIEFtR7o.part3.rar
RSmWMPb0vWr8LIEFtR7o.part4.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part01.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part02.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part03.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part04.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part05.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part06.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part07.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part08.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part09.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part10.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
tBJDjsyJtFpY0d3aQ.part1.rar
tBJDjsyJtFpY0d3aQ.part2.rar
tBJDjsyJtFpY0d3aQ.part3.rar
tBJDjsyJtFpY0d3aQ.part4.rar
tBJDjsyJtFpY0d3aQ.part5.rar
tBJDjsyJtFpY0d3aQ.part6.rar
W1Pn8SHf7pbMSf1u99C4f.part1.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part1.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar

(In the future, it can be possible that there are files with the ending name.part001.rar , name.part002.rar , ... , name.part123.rar )

I am looking for way to become a list with only the last .part'N'.rar.

I want to see:

1yBWVnZCx8CoPrGIG.part23.rar
DaHs0QJnJbt.part4.rar
n5oTzoLvG.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
tBJDjsyJtFpY0d3aQ.part6.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar

How can I do it?

7 Answers 7

11

With zsh:

$ files=( *.part<->.rar(Nn) )
$ typeset -A last
$ for f ($files) last[$f:r:r]=$f
$ print -roC1 -- $last
1yBWVnZCx8CoPrGIG.part23.rar
DaHs0QJnJbt.part4.rar
n5oTzoLvG.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
tBJDjsyJtFpY0d3aQ.part6.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar

Explanations:

# Put a listing of all files that match the "glob" pattern
# ANYTHING.partANYNUMBER.rar
# into the variable called "files"
# without erroring out if nothing matches (N),
# and sorting things numerically (n)
files=( *.part<->.rar(Nn) )

# make an associative array (-A) called "last"
#  (in Python, this would be called a dict,
#   in C++ a std::map<std::string,std::string>)
typeset -A last

# loop over all entries in "files",  and for each strip the last file name suffix
# (as separated by a "."), twice; i.e., remove the .rar, and remove the .partANYNUMBER.
# Store the file name in last[twiceshortened]
for f ($files) last[$f:r:r]=$f
# Note that this overwrites the entry for 1yBWVnZCx8CoPrGIG until the highest-
# numbered part is reached.

# Print the full list:
print -roC1 -- $last
# -r : no fancy escaping (we don't need that and it makes things strange)
# -o : print sorted, in ascending order
# -C1 : print as 1 column
# -- : the stuff to be printed follows after this
# $last : print the content (not the keys!) of the "last" associative array
# You get:
1yBWVnZCx8CoPrGIG.part23.rar
DaHs0QJnJbt.part4.rar
n5oTzoLvG.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
tBJDjsyJtFpY0d3aQ.part6.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar
1
  • that was nice, and refreshed my memory of the <number range> glob! Thus, I added a bit of an explanation. Wonder whether adding (.) to the glob qualifiers would make sense. Commented Aug 17 at 12:48
10

If you trust the ls -1 command (that means, if your file names contain no whitespace, no spaces or newlines etc.) you can use:

$ ls -r1 | sort -t'.' -k1,1 -u
1yBWVnZCx8CoPrGIG.part23.rar
DaHs0QJnJbt.part4.rar
n5oTzoLvG.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
tBJDjsyJtFpY0d3aQ.part6.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar

Reverse sort your list and use the sort command with . as field separator, only look at the first field and output every field only once.

2
  • 6
    Worth noting that it assumes files names contain no newline characters and that it contains no other . besides the one before part and rar and that the numbers are always 0-padded to the length of the highest part number. Commented Aug 17 at 12:40
  • That would print one of the input file names that begin with 1yBWVnZCx8CoPrGIG, etc., not necessarily the first one - you'd need GNU awk for -s (stable sort) to guarantee output order based on input order. Commented Aug 17 at 18:19
4

Using perl:

$ perl -MFile::Basename -e '
  my @files = glob q(*.rar);

  my %bases;
  foreach (@files) {
    my ($b, $p) = split /\.part/, basename $_, q(.rar);
    $bases{$b} = $p if (!defined $bases{$b} or $p > $bases{$b});
  };

  foreach my $k (sort keys %bases) {
    print "$k.part$bases{$k}.rar\n"
  }'
1yBWVnZCx8CoPrGIG.part23.rar
DaHs0QJnJbt.part4.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar
n5oTzoLvG.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
tBJDjsyJtFpY0d3aQ.part6.rar

First this gets all the filenames matching the glob *.rar into an array called @files. Then it converts the array into a hash (aka associative array) called %bases where each key is the base filename and the value is the highest part number seen for that basename.

Then it prints out each basename with the part number in the same format as the original filenames: base.partN.rar

This uses the File::Basename module, which is included with perl as part of its standard library.

There are shorter, more obfuscated ways to do this in perl, but this IMO is a nice balance between brevity and readability. It was written to pass the restrictions of use strict (or -Mstrict on the command-line), so it could be re-used as part of a larger script. That's why there are all the my declarations that are usually skipped for one-liners.

3

Using any awk:

$ ls -r1 | awk -F'.' '!seen[$1]++'
tBJDjsyJtFpY0d3aQ.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
n5oTzoLvG.part6.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
DaHs0QJnJbt.part4.rar
1yBWVnZCx8CoPrGIG.part23.rar

If you don't want to parse the output of ls then you could alternatively do this with any printf, sort and awk:

printf '%s\n' * | sort -t'.' -r -k1,1 -k2,2 | awk -F'.' '!seen[$1]++'

That assumes your file names do not contain newlines, that the number of digits at the end of each part is consistent for each substring before part, and there is no . before part in any of the strings.

1
  • The reason you don't parse the output of ls (without -l) is that it's newline delimited, while filenames can be made of any number of lines. With '%s\n', you're printing the file names newline delimited just the same. To print list of file paths in a way that can be processed reliably, you want NUL ('%s\0' or GNU ls --zero) instead of NL as the delimiter (and use sort -z, gawk -v RS='\0' etc assuming GNU implementations) Commented Aug 18 at 18:45
3

Using bash that has the loadable builtin kv (most probably 5.3+ version)

#!/usr/bin/env bash

shopt -s nullglob
enable kv || exit

kv -A assoc -s . -d '' < <(
  printf '%s\0' *.rar
)

for i in "${!assoc[@]}"; do
  printf '%s.%s\n' "$i" "${assoc["$i"]}"
done

Output

XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar
DaHs0QJnJbt.part4.rar
n5oTzoLvG.part6.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
1yBWVnZCx8CoPrGIG.part23.rar

According to help kv

    kv: kv [-A ARRAYNAME] [-s SEPARATORS] [-d RS]
    Read key-value pairs into an associative array.
    
    Read delimiter-terminated records composed of a single key-value pair
    from the standard input and add the key and corresponding value
    to the associative array ARRAYNAME. The key and value are separated
    by a sequence of one or more characters in SEPARATORS. Records are
    terminated by the first character of RS, similar to the read and
    mapfile builtins.
    
    If SEPARATORS is not supplied, $IFS is used to separate the keys
    and values. If RS is not supplied, newlines terminate records.
    If ARRAYNAME is not supplied, "KV" is the default array name.
    
    Returns success if at least one key-value pair is stored in ARRAYNAME.

If sorted output is needed (from the OP's output), then one more loadable builtin named asort from: https://cgit.git.savannah.gnu.org/cgit/bash.git/plain/examples/loadables/asort.c

Something like:

#!/usr/bin/env bash

shopt -s nullglob
enable kv || exit
enable asort || exit

kv -A assoc -s . -d '' < <(
  printf '%s\0' *.rar
)

keys=("${!assoc[@]}")
asort keys

for i in "${keys[@]}"; do
  printf '%s.%s\n' "$i" "${assoc["$i"]}"
done

Output

1yBWVnZCx8CoPrGIG.part23.rar
DaHs0QJnJbt.part4.rar
n5oTzoLvG.part6.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
tBJDjsyJtFpY0d3aQ.part6.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar

According to help asort

asort: asort [-nr] array ...  or  asort [-nr] -i dest source
    Sort arrays in-place.
    
    Options:
      -n  compare according to string numerical value
      -r  reverse the result of comparisons
      -i  sort using indices/keys
    
    If -i is supplied, SOURCE is not sorted in-place, but the indices (or keys
    if associative) of SOURCE, after sorting it by its values, are placed as
    values in the indexed array DEST
    
    Associative arrays may not be sorted in-place.
    
    Exit status:
    Return value is zero unless an error happened (like invalid variable name
    or readonly array).

NOTE: As per Stéphane Chazelas the separator must not be part of the filename, in this case a dot/period. Also the script works with the given file names by the OP.

Files like:

(In the future, it can be possible that there are files with the ending name.part001.rar , name.part002.rar , ... , name.part123.rar )

Cannot be parsed properly by the current script.

2
  • 3
    Worth noting that it assumes files names contain no other . besides the one before part and rar and that the numbers are always 0-padded to the length of the highest part number (wouldn't work correctly for instance after touch foo.part{1..12}.rar where it would return part9 instead of part12 Commented Aug 19 at 6:01
  • 1
    ... i.e. you're relying on the order the shell expands that glob pattern into filenames ... which happens alphabetically in fact i.e. 12 is expanded before 9 and that puts them in this order x.part12.rar x.part9.rar ... So 9 gets processed after 12 overwriting it's key in the array and at the end you get x.part9.rar printed where it should have been x.part12.rar instead. Commented Aug 19 at 10:44
1

It can be done in bash:

{
unset files
declare -A files
for f in *.rar
do
if [[ "$f" =~ (.*\.part)([[:digit:]]+)\.rar ]]
then
if [[ -v "files[${BASH_REMATCH[1]}]" ]]
then
(( 10#${BASH_REMATCH[2]} > 10#${files[${BASH_REMATCH[1]}]} )) && 
files[${BASH_REMATCH[1]}]="${BASH_REMATCH[2]}"
else
files[${BASH_REMATCH[1]}]="${BASH_REMATCH[2]}"
fi
fi
done
for i in "${!files[@]}"
do
printf '%s\n' "$i${files[$i]}.rar"
done
}

... outputs:

tBJDjsyJtFpY0d3aQ.part6.rar
RSmWMPb0vWr8LIEFtR7o.part5.rar
DaHs0QJnJbt.part4.rar
W1Pn8SHf7pbMSf1u99C4f.part2.rar
T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar
n5oTzoLvG.part6.rar
XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar
okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar
1yBWVnZCx8CoPrGIG.part23.rar

The idea is to arithmetically compare the part numbers in identical filenames (identical except for the part number) and find which number is the greatest.

3
  • That [[ -v "files[$...]" ]] is an arbitrary command execution vulnerability with bash 5.1 at least where I try this on. Try for instance with x='x$(uname>&2)' bash -c 'typeset -A files; [[ -v "files[$x]" ]]' which for me outputs Linux on stderr. Commented Aug 20 at 3:47
  • @StéphaneChazelas AFAIK that is fixed in bash 5.2 (released, probably, 3 years ago) and I have just confirmed that on version 5.2.37... exists on 5.1 as you say and 5.0 IIRC and I don't know about prior versions ... Still exists on zsh 5.9 though. Commented Aug 20 at 10:09
  • 1
    Related: How to use associative arrays safely inside arithmetic expressions?. On zsh, you'd use (( $+hash[$key] )) Commented Aug 20 at 10:32
1
FOLDER="myarchivefolder";
( \
    cd "$FOLDER"; \
    find . -type f -iname '*.rar' | \
    cut -d '/' -f 2 | \
    LC_ALL=C sort -t "." --key 1,1 --key 2.5n -r | \
    sort --merge --unique -t "." --key 1,1 \
)

Explanation:

  1. cd into the folder inside a subshell, to avoid unexpected dots in the file paths
  2. Use find to list all files in current folder (add -maxdepth 1 to avoid descending into nested folders)
  3. cut to remove leading ./ from find output
  4. Sort file names
    • -t "." - dot . as field separator
    • --key 1,1 - sort by the first field, the archive name
    • --key 2.5n - and then numerical sort by part number (starting at the 5th character)
    • -r reverse, so the highest part number ends up as the first line
  5. Only output one archive file for each sorted entry
    • sort --merge - skip sorting
    • --unique - subsequent archives are omitted as a duplicates
    • --key 1,1 - only consider archive name for deduplication

The good:

  • avoids globbing: the for loop used in other answers may error with "Argument list too long" on large folders
  • correctly handles 3-digit ".part025" numbers starting with zero
  • It looks to be POSIX-compatible, if you replace long options with short ones.

The bad:

  • Expects that the archive name itself will not contain dots (nor newlines). It would've been more robust to count dots from the end of a file path, but it is not possible with sort's key definition.
  • Many process invocations and piping

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.