3

I have a filename following this model:

 1.raw_bank_details_211.trg
 2.raw_bank_details_222.trg

I need to use the cut command in unix and cut the above string to obtain 211 and 222 from the strings and echo the value.

I already used grep grep -o -E '[0-9]+', I need an alternative to this.

2
  • a filename OR a string? Commented Apr 25, 2017 at 14:36
  • I advise not to (ab)use the word requirement. This is a pro-bono service. Commented Apr 25, 2017 at 14:42

2 Answers 2

5

You would be better off using a standard text processing tool instead of a naive tool like cut.

Here are some ways:


With awk, getting the _ or . separated second last field:

awk -F '[_.]' '{print $(NF-1)}' file.txt

grep with PCRE (-P):

grep -Po '\d+(?=[^_]*$)' file.txt
  • -o only gets the matched portion

  • \d+ matches one or more digits

  • The zero width positive lookahead, (?=[^_]*$), ensures that no _ is following till end of the line


With sed:

sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt
  • .*_ matches everything upto last _

  • ([[:digit:]]+) matches the required digits and put in captured group

  • .* matches the rest

  • In the replacement, only the captured group, \1, is used


With perl, same logic to the sed one:

perl -pe 's/.*_(\d+).*/$1/' file.txt 

If you must use cut, do it in two steps, first get the _ separated 4th field and then get . separated 1st field:

cut -d_ -f4 file.txt | cut -d. -f1

This is not recommended as this requires the field numbers to be hardcoded.


If it were a string, i would do it using shell parameter expansion:

% str='1.raw_bank_details_211.trg'

% str=${str##*_} 

% echo "${str%%.*}"
211

You can still use a while construct and take each line into a variable and do this, but that would be slow for a large file. Also alternately you could use _. as the IFS and get the hardcoded field (like cut) instead if you want.


Example:

% cat file.txt                          
1.raw_bank_details_211.trg
2.raw_bank_details_222.trg

% awk -F '[_.]' '{print $(NF-1)}' file.txt
211
222

% grep -Po '\d+(?=[^_]*$)' file.txt         
211
222

% sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt
211
222

% perl -pe 's/.*_(\d+).*/$1/' file.txt 
211
222

% cut -d_ -f4 file.txt | cut -d. -f1
211
222
4

cut is the wrong tool for that. To manipulate short strings such as file names, use the shell's string manipulation facilities whenever possible. All sh-type shells¹ (sh, dash, bash, ksh, zsh, …) have some basic string manipulation as part of variable substitution. See e.g. the dash manual under “parameter expansion”. You can remove the shortest/longest prefix/suffix that matches a pattern.

You want the last sequence of digits in the file name, so:

  1. Determine the non-numeric suffix by stripping everything up to the last digit.
  2. Remove that suffix.
  3. Strip everything up to the last non-digit.
filename=1.raw_bank_details_211.trg
suffix="${filename##*[0-9]}"
number="${filename%"$suffix"}"
number="${number##*[!-0-9]}"

¹ Except some pre-POSIX Bourne shells, but you don't care about those.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.