Skip to main content
deleted 1432 characters in body
Source Link
slm
  • 379.8k
  • 127
  • 793
  • 897

That looks good too. So now let's figure out some ways we could approach the next step of identifying numbers with 2 digits from the set {4,5,6}. My first instinct here is to go for grep. There are also methods for doing this purely in Bash, but I like to use the various tooltools, grep, awk, and sed for doing these types of things, mainly because that's how my mind works.

It also fails for these strings:

$ echo "41412" | grep -E "[456]{2}"
$

So is this method usable? It is if we change tactics a bit. If we filter out the strings with 3+ of the characters in our set, and then filter for the strings with exactly 2but we'll get what we wanthave to rejigger the regex.

$ echo -e "41123\n44123\n44423" | grep -vE "[456]{3,}""41123\n44123\n44423\n41423" | grep -E "[456]"[^456]*([456][^456]*){2}"
44123
44423
41423

The above is presenting 34 types of the strings. The echo -e "41123\n44123\n44423""41123\n44123\n44423\n41423" just prints 34 of the numbers from our range.

$ echo -e "41123\n44123\n44423""41123\n44123\n44423\n41423"
41123
44123
44423

The grep -vE "[456]{3,} will skip lines that contain 3+ of digits from our set.

$ echo -e "41123\n44123\n44423" | grep -vE "[456]{3,}"
41123
4412341423

The last grep will then lookHow does this regex work? It sets up a regex pattern of zero or more "not [456]" followed by either 1 or more [456] or zero or more "not [456]" characters, looking for strings with exactly 2 characters from our setoccurrences of the latter.

for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
  if echo $CON1 | grep -vE "[456]{3,}" | grep -q -E "[456]"[^456]*([456][^456]*){2}"; then
      echo $CON1
    fi
done

But this method proves to be dog slow. The problem is all thosethat grep's. They'reIt's expensive, and we're running `grep 2 times1 time, per iteration through the loop, so that's ~160k~80k times!

To improve that we could move our grep commandscommand outside the loop and run it 1 time, after itsthe list's been generated, like so, using our original version of the script that just echoed the numbers out:

$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]"[^456]*([456][^456]*){2}"
$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]"[^456]*([456][^456]*){2}" | paste -s -d"+"
10044+10045+10046+10054+10055+10056+10064+10065+10066+10144+10145+...

$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]"[^456]*([456][^456]*){2}" | paste -s -d"+" | bc
10574384852409327540

So we need some method for testing if a digit has exactly 2 digits within Bash, but isn't as expensive as calling grep 160k80k times. Modern versions of Bash include the ability to match using the =~ operator, which can do similar matching as grep. Let's take a look at that next.

#!/bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
  if [[ $CON1 =~ [456][^456]*([456][^456]*){2} ]]; then
    echo $CON1
  fi
done

The =~ operator makes use of the same regex that we developed earlier, using the set notation, [456]{2}. This works, but suffers from a similar issue as grep, mainly that it allows digits with 3+ from the set.

$ ./cmd1.bash  | grep -E "[456]{3,}" | head -5
10444
10445
10446
10454
10455

So we can use the same trick here:

#! /bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
    if [[ $CON1 =~ [456]{2} && ! $CON1 =~ [456]{3,} ]]; then
            echo $CON1
    fi
done

In the above if statement, we're looking for strings that contain 2 digits of [456] and do not contain 3 or more. The notation && ! $CON1 =~ [456]{3,} means AND the exclamation point means NOT and $CONT1 =~ [456]{3,}, matches all the strings that contain 3 or more digits from [456].

Still has a problem?

Yes this will only catch strings where digits from [456] are adjacent. So a string like 41511. Check it:

$ ./cmd1.bash | grep 41511
$

How can we fix this? Well one approach would be to change the regex around like so:

if [[ $CON1 =~ [^456]*([456][^456]*){2} ]]; then
        echo $CON1
fi

Checking it shows that it works with 41511 now:

How does this regex work? It sets up a regex pattern of zero or more "not [456]" followed by either 1 or more [456] or zero or more "not [456]" characters, looking for 2 occurrences of the latter.

That looks good too. So now let's figure out some ways we could approach the next step of identifying numbers with 2 digits from the set {4,5,6}. My first instinct here is to go for grep. There are also methods for doing this purely in Bash, but I like to use the various tool, grep, awk, and sed for doing these types of things, mainly because that's how my mind works.

So is this method usable? It is if we change tactics a bit. If we filter out the strings with 3+ of the characters in our set, and then filter for the strings with exactly 2 we'll get what we want.

$ echo -e "41123\n44123\n44423" | grep -vE "[456]{3,}" | grep -E "[456]{2}"
44123

The above is presenting 3 types of the strings. The echo -e "41123\n44123\n44423" just prints 3 of the numbers from our range.

$ echo -e "41123\n44123\n44423"
41123
44123
44423

The grep -vE "[456]{3,} will skip lines that contain 3+ of digits from our set.

$ echo -e "41123\n44123\n44423" | grep -vE "[456]{3,}"
41123
44123

The last grep will then look for strings with exactly 2 characters from our set.

for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
  if echo $CON1 | grep -vE "[456]{3,}" | grep -q -E "[456]{2}"; then
    echo $CON1
  fi
done

But this method proves to be dog slow. The problem is all those grep's. They're expensive, we're running `grep 2 times, per iteration through the loop, so that's ~160k times!

To improve that we could move our grep commands outside the loop and run it 1 time, after its been generated, like so, using our original version of the script that just echoed the numbers out:

$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]{2}"
$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]{2}" | paste -s -d"+"
10044+10045+10046+10054+10055+10056+10064+10065+10066+10144+10145+...

$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]{2}" | paste -s -d"+" | bc
1057438485

So we need some method for testing if a digit has exactly 2 digits within Bash, but isn't as expensive as calling grep 160k times. Modern versions of Bash include the ability to match using the =~ operator, which can do similar matching as grep. Let's take a look at that next.

#!/bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
  if [[ $CON1 =~ [456]{2} ]]; then
    echo $CON1
  fi
done

The =~ operator makes use of the same regex that we developed earlier, using the set notation, [456]{2}. This works, but suffers from a similar issue as grep, mainly that it allows digits with 3+ from the set.

$ ./cmd1.bash  | grep -E "[456]{3,}" | head -5
10444
10445
10446
10454
10455

So we can use the same trick here:

#! /bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
    if [[ $CON1 =~ [456]{2} && ! $CON1 =~ [456]{3,} ]]; then
            echo $CON1
    fi
done

In the above if statement, we're looking for strings that contain 2 digits of [456] and do not contain 3 or more. The notation && ! $CON1 =~ [456]{3,} means AND the exclamation point means NOT and $CONT1 =~ [456]{3,}, matches all the strings that contain 3 or more digits from [456].

Still has a problem?

Yes this will only catch strings where digits from [456] are adjacent. So a string like 41511. Check it:

$ ./cmd1.bash | grep 41511
$

How can we fix this? Well one approach would be to change the regex around like so:

if [[ $CON1 =~ [^456]*([456][^456]*){2} ]]; then
        echo $CON1
fi

Checking it shows that it works with 41511 now:

How does this regex work? It sets up a regex pattern of zero or more "not [456]" followed by either 1 or more [456] or zero or more "not [456]" characters, looking for 2 occurrences of the latter.

That looks good too. So now let's figure out some ways we could approach the next step of identifying numbers with 2 digits from the set {4,5,6}. My first instinct here is to go for grep. There are also methods for doing this purely in Bash, but I like to use the various tools, grep, awk, and sed for doing these types of things, mainly because that's how my mind works.

It also fails for these strings:

$ echo "41412" | grep -E "[456]{2}"
$

So is this method usable? It is if we change tactics a bit, but we'll have to rejigger the regex.

$ echo -e "41123\n44123\n44423\n41423" | grep -E "[^456]*([456][^456]*){2}"
44123
44423
41423

The above is presenting 4 types of the strings. The echo -e "41123\n44123\n44423\n41423" just prints 4 of the numbers from our range.

$ echo -e "41123\n44123\n44423\n41423"
41123
44123
44423
41423

How does this regex work? It sets up a regex pattern of zero or more "not [456]" followed by either 1 or more [456] or zero or more "not [456]" characters, looking for 2 occurrences of the latter.

for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
  if echo $CON1 | grep -q -E "[^456]*([456][^456]*){2}"; then
      echo $CON1
    fi
done

But this method proves to be dog slow. The problem is that grep. It's expensive, and we're running `grep 1 time, per iteration through the loop, so that's ~80k times!

To improve that we could move our grep command outside the loop and run it 1 time, after the list's been generated, like so, using our original version of the script that just echoed the numbers out:

$ ./cmd.bash | grep -E "[^456]*([456][^456]*){2}"
$ ./cmd.bash | grep -E "[^456]*([456][^456]*){2}" | paste -s -d"+"
10044+10045+10046+10054+10055+10056+10064+10065+10066+10144+10145+...

$ ./cmd.bash | grep -E "[^456]*([456][^456]*){2}" | paste -s -d"+" | bc
2409327540

So we need some method for testing if a digit has exactly 2 digits within Bash, but isn't as expensive as calling grep 80k times. Modern versions of Bash include the ability to match using the =~ operator, which can do similar matching as grep. Let's take a look at that next.

#!/bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
  if [[ $CON1 =~ [^456]*([456][^456]*){2} ]]; then
    echo $CON1
  fi
done

Checking it shows that it works with 41511 now:

added 633 characters in body
Source Link
slm
  • 379.8k
  • 127
  • 793
  • 897

Still has a problem?

Yes this will only catch strings where digits from [456] are adjacent. So a string like 41511. Check it:

$ ./cmd1.bash | grep 41511
$

How can we fix this? Well one approach would be to change the regex around like so:

if [[ $CON1 =~ [^456]*([456][^456]*){2} ]]; then
        echo $CON1
fi

Checking it shows that it works with 41511 now:

$ ./cmd1.bash | grep 41511
41511

How does this regex work? It sets up a regex pattern of zero or more "not [456]" followed by either 1 or more [456] or zero or more "not [456]" characters, looking for 2 occurrences of the latter.

Still has a problem?

Yes this will only catch strings where digits from [456] are adjacent. So a string like 41511. Check it:

$ ./cmd1.bash | grep 41511
$

How can we fix this? Well one approach would be to change the regex around like so:

if [[ $CON1 =~ [^456]*([456][^456]*){2} ]]; then
        echo $CON1
fi

Checking it shows that it works with 41511 now:

$ ./cmd1.bash | grep 41511
41511

How does this regex work? It sets up a regex pattern of zero or more "not [456]" followed by either 1 or more [456] or zero or more "not [456]" characters, looking for 2 occurrences of the latter.

Source Link
slm
  • 379.8k
  • 127
  • 793
  • 897

Getting started

Whenever I have a project like this I like to approach it in stages. The first thing I like to do is add an echo to the inside the loop and then run it, to make sure that the loop is giving me what I want.

#! /bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
  echo $CON1
done

Now when I run it I'll use head -5 to just show the first 5 lines it outputs.

$ ./cmd.bash | head -5
10000
10001
10002
10003
10004

OK, so that looks good, check the end like this:

$ ./cmd.bash | tail -5
99995
99996
99997
99998
99999

That looks good too. So now let's figure out some ways we could approach the next step of identifying numbers with 2 digits from the set {4,5,6}. My first instinct here is to go for grep. There are also methods for doing this purely in Bash, but I like to use the various tool, grep, awk, and sed for doing these types of things, mainly because that's how my mind works.

An approach

So how can we greplines that contain 2 digits from the set, {4,5,6}? For this you can use a set notation, that's written like this in regex, [456]. You can also specify how many digits you want to match from this set. That's written like this:

[456]{#}

Where # is a number or range of numbers. If we wanted 3, we'd write [456]{3}. If we wanted 2-5 digits, we'd write [456]{2,5}. If you wanted 3 or more, [456]{3,}`.

So for your scenario it's [456]{2}. To use a regex in grep, your particular version of grep needs to support the -E swtich. This is typically available in most standard grep's.

$ echo "45123" | grep -E "[456]{2}"
45123

Seems to work but if we give it numbers with 3, we start to see an issue:

$ echo "45423" | grep -E "[456]{2}"
45423

That's matching too. This is because grep has not concept of the fact that these are digits in a string. It's dumb. We told it to tell us if the series of characters in our string are from a set and that there are 2 of them and there are 2 digits in the string 45423.

So is this method usable? It is if we change tactics a bit. If we filter out the strings with 3+ of the characters in our set, and then filter for the strings with exactly 2 we'll get what we want.

Example

$ echo -e "41123\n44123\n44423" | grep -vE "[456]{3,}" | grep -E "[456]{2}"
44123

The above is presenting 3 types of the strings. The echo -e "41123\n44123\n44423" just prints 3 of the numbers from our range.

$ echo -e "41123\n44123\n44423"
41123
44123
44423

The grep -vE "[456]{3,} will skip lines that contain 3+ of digits from our set.

$ echo -e "41123\n44123\n44423" | grep -vE "[456]{3,}"
41123
44123

The last grep will then look for strings with exactly 2 characters from our set.

So now we do a little assembly in your script.

for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
  if echo $CON1 | grep -vE "[456]{3,}" | grep -q -E "[456]{2}"; then
    echo $CON1
  fi
done

Using our head & tail trick from above we can see that it's working:

$ ./cmd.bash | head -5
10044
10045
10046
10054
10055

$ ./cmd.bash | tail -5
99955
99956
99964
99965
99966

But this method proves to be dog slow. The problem is all those grep's. They're expensive, we're running `grep 2 times, per iteration through the loop, so that's ~160k times!

To improve that we could move our grep commands outside the loop and run it 1 time, after its been generated, like so, using our original version of the script that just echoed the numbers out:

$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]{2}"

NOTE: We could drop the for loop entirely and use the command line tool, seq. This will generate the same sequence of numbers, seq 10000 99999.

One liner?

A fancy way to do this would be to use the sequence of numbers from the above command, and then pipe it to the paste command which would insert a + between each number, and then run that output into the command line calculator, bc.

$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]{2}" | paste -s -d"+"
10044+10045+10046+10054+10055+10056+10064+10065+10066+10144+10145+...

$ ./cmd.bash | grep -vE "[456]{3,}" | grep -E "[456]{2}" | paste -s -d"+" | bc
1057438485

But that's a completely different way to solve this problem, so lets get back to the for loop.

Using pure Bash

So we need some method for testing if a digit has exactly 2 digits within Bash, but isn't as expensive as calling grep 160k times. Modern versions of Bash include the ability to match using the =~ operator, which can do similar matching as grep. Let's take a look at that next.

#!/bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
  if [[ $CON1 =~ [456]{2} ]]; then
    echo $CON1
  fi
done

Running this looks to do exactly what we want.

$ ./cmd1.bash  | head -5
10044
10045
10046
10054
10055

$ ./cmd1.bash  | tail -5
99955
99956
99964
99965
99966

The =~ operator makes use of the same regex that we developed earlier, using the set notation, [456]{2}. This works, but suffers from a similar issue as grep, mainly that it allows digits with 3+ from the set.

$ ./cmd1.bash  | grep -E "[456]{3,}" | head -5
10444
10445
10446
10454
10455

So we can use the same trick here:

#! /bin/bash
for (( CON1=10000; CON1<=99999; CON1++ )) ;
do
    if [[ $CON1 =~ [456]{2} && ! $CON1 =~ [456]{3,} ]]; then
            echo $CON1
    fi
done

In the above if statement, we're looking for strings that contain 2 digits of [456] and do not contain 3 or more. The notation && ! $CON1 =~ [456]{3,} means AND the exclamation point means NOT and $CONT1 =~ [456]{3,}, matches all the strings that contain 3 or more digits from [456].

References