Skip to main content
4 of 9
added 497 characters in body
Ed Morton
  • 35.8k
  • 6
  • 25
  • 60

Never put a \ in front of any character that isn't a metacharacter (e.g. the /s in the regexps in your question) because doing so can turn it into a metacharacter for any given tool since a backslash before an ordinary character is undefined behavior per POSIX.

I'm not sure but I think you may be asking how to both protect against bash globbing, etc. constructs and ERE metachars so you can create a regexp that includes parts to be treated literally. If so:

For example:

$ cat tst.sh
#!/usr/bin/env bash

goodUrl='https://www.youtube.com/playlist?list=PLmHVyfmcRKyxvxehq3fiGjKDsEyy6d4Tz'
badUrl='https://wwwxyoutubexcom/playlist?list=PLmHVyfmcRKyxvxehq3fiGjKDsEyy6d4Tz'

echo "goodUrl=$goodUrl"
echo "badUrl=$badUrl"

echo "######################"

orig_yt='www.youtube.com'

regex='^https://'"$orig_yt"'/playlist\?(.+&)?list='

if [[ $goodUrl =~ $regex ]]; then
    echo "goodUrl matched orig_yt: $regex"
else
    echo "goodUrl did not match orig_yt: $regex"
fi

if [[ $badUrl =~ $regex ]]; then
    echo "badUrl matched orig_yt: $regex"
else
    echo "badUrl did not match orig_yt: $regex"
fi

echo "######################"

sanitized_yt="${orig_yt//[^^\\]/[&]}"
sanitized_yt="${sanitized_yt//[\\^]/\\&/}"

regex='^https://'"$sanitized_yt"'/playlist\?(.+&)?list='

if [[ $goodUrl =~ $regex ]]; then
    echo "goodUrl matched sanitized_yt: $regex"
else
    echo "goodUrl did not match sanitized_yt: $regex"
fi

if [[ $badUrl =~ $regex ]]; then
    echo "badUrl matched sanitized_yt: $regex"
else
    echo "badUrl did not match sanitized_yt: $regex"
fi

$ ./tst.sh
goodUrl=https://www.youtube.com/playlist?list=PLmHVyfmcRKyxvxehq3fiGjKDsEyy6d4Tz
badUrl=https://wwwxyoutubexcom/playlist?list=PLmHVyfmcRKyxvxehq3fiGjKDsEyy6d4Tz
######################
goodUrl matched orig_yt: ^https://www.youtube.com/playlist\?(.+&)?list=
badUrl matched orig_yt: ^https://www.youtube.com/playlist\?(.+&)?list=
######################
goodUrl matched sanitized_yt: ^https://[w][w][w][.][y][o][u][t][u][b][e][.][c][o][m]/playlist\?(.+&)?list=
badUrl did not match sanitized_yt: ^https://[w][w][w][.][y][o][u][t][u][b][e][.][c][o][m]/playlist\?(.+&)?list=

Regarding the update to your question that includes this:

text='The pizza is 2'\'' and 100$price' # The pizza is 2' and 100$price
...
# if [[ "$text" =~ [0-9]+' and [0-9]+\$price ]]; then echo "this is what I prefer to have -- literal regex, like / /g in js. But this wont even compile"; fi

You simply cannot write such a regexp in-line, you need to store it in a variable to avoid issues with bash metachars. You would also need to further escape any regexp metachars that you want treated literally but you're already escaping the $ manually in your regexp so you can skip that step and just write:

regex='[0-9]+'\'' and [0-9]+\$price'
if [[ "$text" =~ $regex ]]; ...

Note, though, that you when you say "do you see I need to escape the \ & $ ?" - you need to escape the \ $ and ? because they're regexp metachars, not becasue they're bash metachars. You do not need to escape a & as it's not a regexp metachar (unless bash regexps support backreferences in the regexp, which I doubt).

Ed Morton
  • 35.8k
  • 6
  • 25
  • 60