21

I need a to find patterns that are 6 digits and the first 3 digits are specific digits, but the remaining 3 digits will be any digit. For example, 6 digit strings starting with 123 followed by any 3 digits.

var1="abc,123111,"
var2="abcdefg,123222,"
var3="xyzabc,987111,"

if [[ $var1 == *",123ddd,"* ]] ; then echo "Pattern matched"; fi

Where ddd are any digits. var1 and var2 would match the pattern but var 3 would not. I can't seem to get it just right.

2 Answers 2

31

Use a character class: [0-9] matches 0, 9, and every character between them in the character set, which - at least in Unicode (e.g. UTF-8) and subset character sets (e.g. US-ASCII, Latin-1) - are the digits 1 through 8. So it matches any one of the 10 Latin digits.

if [[ $var1 == *,123[0-9][0-9][0-9],* ]] ; then echo "Pattern matched"; fi

Using =~ instead of == changes the pattern type from shell standard "glob" patterns to regular expressions ("regexes" for short). You can make an equivalent regex a little shorter:

if [[ $var1 =~ ,123[0-9]{3}, ]] ; then echo "Pattern matched"; fi

The first shortening comes from the fact that [[ =~ ]] only requires the regex to match any part of the string, not the whole thing. Therefore you don't need the equivalent of the leading and trailing *s that you find in the glob pattern.

The second length reduction is due to the {n} syntax, which lets you specify a number of repetitions of the previous pattern without actually repeating the pattern itself in the regex. (You can also match any of a range of repetition counts by specifying a minimum and maximum: [0-9]{2,4} will match either two, three, or four digits in a row.)

It's worth noting that you could also use a named character class to match digits. Depending on your locale, [[:digit:]] may be exactly equivalent to [0-9], or it may include characters from other scripts with the Unicode "Number, Decimal Digit" property.

if [[ $var1 =~ ,123[[:digit:]]{3}, ]] ; then echo "Pattern matched"; fi
Sign up to request clarification or add additional context in comments.

1 Comment

Nicely done. Quibble: [0-9] matches the characters in the range 0 through 9 as defined by the (effective) LC_CTYPE value (see locale). In the now-ubiquitous UTF-8-based locales, this coincides with the ASCII/Latin-1 characters 0 through 9, because these encodings are true subsets of Unicode. By contrast [[:digits:]] matches not just 0 through 9, but also additional characters in UTF-8, based on what the Unicode standard considers a digit.
2

Bash glob pattern matching [0-9] can be used to match digit:

if [[ $var1 == *,123[0-9][0-9][0-9],* ]] ; then echo "Pattern matched"; fi

Alternatively, you can use regex pattern matching with =~:

if [[ $var1 =~ .*,123[0-9]{3},.* ]] ; then echo "Pattern matched"; fi

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.