Revisions to Why doesn't this one-line sed command work as I thought it should?

Typo

Source Link

edited Sep 17, 2021 at 10:18

23.6k
25
55
77

There are two problems with your RegEx. It may be due to a misunderstanding on how RegExes work.

You have used the "extended" regular expression syntax, which makes () special characters used to denote capture groups. However, they do otherwise not interfere with the matching mechanism itself. Since you don't make use of the capture group, your RegEx amounts to
```
^.*content=\"\">$
```
which expects the pattern content="" with andan empty quoted string, and immediately followed by the closing >. This doesn't occur in your input, so sed does nothing as no match is achieved. (By the way, you don't need to escape the " - they are not special in RegExes, and your program is in single-quotes, so the shell won't misinterpret the " either.)
Even if you correct that, as in
```
^.*content="[^"]*".*>$
```
you replace the matched part of the line, which due to the anchors is the entire line, with the empty string, so in your case, nothing would remain.

To alleviate, you will need to recur to the original idea of using a capture group, but use that to contain the relevant part of the line and then replace the entire line with the content of the capture group, as in:

sed -E 's;^.*content="([^"]*)".*$;\1;'

This will define the content between the first and next " after the content= attribute name as capture group, but otherwise match the entire line. It will then replace the entire line with only the content of the capture group via the \1 expression.

There are two problems with your RegEx. It may be due to a misunderstanding on how RegExes work.

You have used the "extended" regular expression syntax, which makes () special characters used to denote capture groups. However, they do otherwise not interfere with the matching mechanism itself. Since you don't make use of capture group, your RegEx amounts to
```
^.*content=\"\">$
```
which expects the pattern content="" with and empty quoted string, and immediately followed by the closing >. This doesn't occur in your input, so sed does nothing as no match is achieved. (By the way, you don't need to escape the " - they are not special in RegExes, and your program is in single-quotes, so the shell won't misinterpret the " either.)
Even if you correct that, as in
```
^.*content="[^"]*".*>$
```
you replace the matched part of the line, which due to the anchors is the entire line, with the empty string, so in your case, nothing would remain.

To alleviate, you will need to recur to the original idea of using a capture group, but use that to contain the relevant part of the line and then replace the entire line with the content of the capture group, as in:

sed -E 's;^.*content="([^"]*)".*$;\1;'

This will define the content between the first and next " after the content= attribute name as capture group, but otherwise match the entire line. It will then replace the entire line with only the content of the capture group via the \1 expression.

There are two problems with your RegEx. It may be due to a misunderstanding on how RegExes work.

You have used the "extended" regular expression syntax, which makes () special characters used to denote capture groups. However, they do otherwise not interfere with the matching mechanism itself. Since you don't make use of the capture group, your RegEx amounts to
```
^.*content=\"\">$
```
which expects the pattern content="" with an empty quoted string, and immediately followed by the closing >. This doesn't occur in your input, so sed does nothing as no match is achieved. (By the way, you don't need to escape the " - they are not special in RegExes, and your program is in single-quotes, so the shell won't misinterpret the " either.)
Even if you correct that, as in
```
^.*content="[^"]*".*>$
```
you replace the matched part of the line, which due to the anchors is the entire line, with the empty string, so in your case, nothing would remain.

To alleviate, you will need to recur to the original idea of using a capture group, but use that to contain the relevant part of the line and then replace the entire line with the content of the capture group, as in:

sed -E 's;^.*content="([^"]*)".*$;\1;'

This will define the content between the first and next " after the content= attribute name as capture group, but otherwise match the entire line. It will then replace the entire line with only the content of the capture group via the \1 expression.

Clarify explanation

Source Link

edited Sep 17, 2021 at 10:11

AdminBee

23.6k
25
55
77

There are two problems with your RegEx. It may be due to a misunderstanding on how RegExes work.

You have used the "extended" regular expression syntax, which makes () special characters used to denote capture groups. However, they do otherwise not interfere with the matching mechanism itself. Since you don't make use of capture group, your RegEx amounts to
```
^.*content=\"\">$
```
which expects the pattern content="" with and empty quoted string, and immediately followed by the closing >. This doesn't occur in your input, so sed does nothing as no match is achieved. (By the way, you don't need to escape the " - they are not special in RegExes, and your program is in single-quotes, so the shell won't misinterpret the " either.)
Even if you correct that, as in
```
^.*content="[^"]"*content="[^"]*".*>$
```
you replace the matched part of the line, which due to the anchors is the entire line, with the empty string, so in your case, nothing would remain.

To alleviate, you will need to recur to the original idea of using a capture group, but use that to contain the relevant part of the line and then replace the entire line with the content of the capture group, as in:

sed -E 's;^.*content="([^"]*)".*$;\1;'

This will define the content between the first and next " after the content= attribute name as capture group, but otherwise match the entire line. It will then replace the entire line with only the content of the capture group via the \1 expression.

There are two problems with your RegEx. It may be due to a misunderstanding on how RegExes work.

You have used the "extended" regular expression syntax, which makes () special characters used to denote capture groups. However, they do otherwise not interfere with the matching mechanism itself. Since you don't make use of capture group, your RegEx amounts to
```
^.*content=\"\">$
```
which expects the pattern content="" with and empty quoted string, and immediately followed by the closing >. This doesn't occur in your input, so sed does nothing as no match is achieved.
Even if you correct that, as in
```
^.*content="[^"]".*>$
```
you replace the matched part of the line with the empty string, so in your case, nothing would remain.

To alleviate, you will need to recur to the original idea of using a capture group, but use that to contain the relevant part of the line and then replace the entire line with the content of the capture group, as in:

sed -E 's;^.*content="([^"]*)".*$;\1;'

This will define the content between the first and next " after the content= attribute name as capture group, but otherwise match the entire line. It will then replace the entire line with only the content of the capture group via the \1 expression.

There are two problems with your RegEx. It may be due to a misunderstanding on how RegExes work.

You have used the "extended" regular expression syntax, which makes () special characters used to denote capture groups. However, they do otherwise not interfere with the matching mechanism itself. Since you don't make use of capture group, your RegEx amounts to
```
^.*content=\"\">$
```
which expects the pattern content="" with and empty quoted string, and immediately followed by the closing >. This doesn't occur in your input, so sed does nothing as no match is achieved. (By the way, you don't need to escape the " - they are not special in RegExes, and your program is in single-quotes, so the shell won't misinterpret the " either.)
Even if you correct that, as in
```
^.*content="[^"]*".*>$
```
you replace the matched part of the line, which due to the anchors is the entire line, with the empty string, so in your case, nothing would remain.

To alleviate, you will need to recur to the original idea of using a capture group, but use that to contain the relevant part of the line and then replace the entire line with the content of the capture group, as in:

sed -E 's;^.*content="([^"]*)".*$;\1;'

This will define the content between the first and next " after the content= attribute name as capture group, but otherwise match the entire line. It will then replace the entire line with only the content of the capture group via the \1 expression.

Source Link

answered Sep 17, 2021 at 10:06

AdminBee

23.6k
25
55
77

There are two problems with your RegEx. It may be due to a misunderstanding on how RegExes work.

You have used the "extended" regular expression syntax, which makes () special characters used to denote capture groups. However, they do otherwise not interfere with the matching mechanism itself. Since you don't make use of capture group, your RegEx amounts to
```
^.*content=\"\">$
```
which expects the pattern content="" with and empty quoted string, and immediately followed by the closing >. This doesn't occur in your input, so sed does nothing as no match is achieved.
Even if you correct that, as in
```
^.*content="[^"]".*>$
```
you replace the matched part of the line with the empty string, so in your case, nothing would remain.

To alleviate, you will need to recur to the original idea of using a capture group, but use that to contain the relevant part of the line and then replace the entire line with the content of the capture group, as in:

sed -E 's;^.*content="([^"]*)".*$;\1;'

This will define the content between the first and next " after the content= attribute name as capture group, but otherwise match the entire line. It will then replace the entire line with only the content of the capture group via the \1 expression.

Stack Exchange Network

Return to Answer