Revisions to How to remove markdown links from headers with sed?

added 295 characters in body

Source Link

edited Jul 18 at 12:42

35.8k
6
25
60

sed -i 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

InUsing -i like that will do "inplace" editing in GNU sed but in other sed versions, even BSD sed which also supports inplace editing but requires a backup file name, it'll do different things so you don't tell us what problem you're experiencing when running your script but maybe that's it?

Beyond that, in the first regexp segment \(\#*\):

In the separating spaces part *<blank>*:

You're using <blank>* which matches zero or more #<blank>s when you wanted 1 or more which is <blank><blank>* or <blank>\{1,\}` in a BRE (or +``` in a BRE (or <blank>+ if you were using an ERE).

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

In the first regexp segment \(\#*\):

In the separating spaces part *:

You're using <blank>* which matches zero or more #s when you wanted 1 or more which is <blank><blank>* or <blank>\{1,\}` in a BRE (or +``` if you were using an ERE).

sed -i 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

Using -i like that will do "inplace" editing in GNU sed but in other sed versions, even BSD sed which also supports inplace editing but requires a backup file name, it'll do different things so you don't tell us what problem you're experiencing when running your script but maybe that's it?

Beyond that, in the first regexp segment \(\#*\):

In the separating spaces part <blank>*:

You're using <blank>* which matches zero or more <blank>s when you wanted 1 or more which is <blank><blank>* or <blank>\{1,\} in a BRE (or <blank>+ if you were using an ERE).

deleted 3 characters in body

Source Link

edited Jul 18 at 12:32

Ed Morton

35.8k
6
25
60

Based on what you've told us so far (I'm trying to ... leave just the title:) and the sample input you provided (## [Some title](#some-title)) this might be what you're trying to do, using any awk:

$ awk -F'[][]' '{print $2}' file
Some title

or any sed:

$ sed 's/.*\[\([^]]*\)].*/\1/' file
Some title

but without more truly representative sample input and expected output that's just a guess.

As for what's wrong with your sed script:

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

In the first regexp segment \(\#*\):

You're escaping the literal char # as \# which is undefined behavior per POSIX when you wanted just #.
You're using #* which matches zero or more #s when you wanted 1 or more which is ##* or #\{1,\} in a BRE as sed uses by default (or #+ if you were using an ERE).

In the separating spaces part *:

You're using <blank>* which matches zero or more #s when you wanted 1 or more which is <blank><blank>* (oror <blank>+<blank>\{1,\}` in a BRE (or +``` if you were using an ERE).

In the last regexp segment \[\([^\]]*\)\].*:

You're using [^\]] and so escaping ] which is undefined behavior per POSIX when you wanted just [^]].
You're using \] at the end which is undefined behavior per POSIX since there's no unescaped [ before it when you wanted just ].

If you fixed all of those issues you'd get:

$ sed 's/^\(##*\)  *\[\([^]]*\)].*/\1 \2/' file
## Some title

or

$ sed 's/^\(#\{1,\}\) \{1,\}\[\([^]]*\)].*/\1 \2/' file
## Some title

and since you're using GNU sed which supports EREs you could write that as:

$ sed -E 's/^(#+) +\[([^]]*)].*/\1 \2/' file
## Some title

And then to leave just the title as you said you wanted just means removing the first capture group:

$ sed 's/^##*  *\[\([^]]*\)].*/\1/' file
Some title

$ sed 's/^#\{1,\} \{1,\}\[\([^]]*\)].*/\1/' file
Some title

$ sed -E 's/^#+ +\[([^]]*)].*/\1/' file
Some title

Based on what you've told us so far this might be what you're trying to do, using any awk:

$ awk -F'[][]' '{print $2}' file
Some title

but without more truly representative sample input and expected output that's just a guess.

As for what's wrong with your sed script:

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

In the first regexp segment \(\#*\):

You're escaping the literal char # as \# which is undefined behavior per POSIX when you wanted just #.
You're using #* which matches zero or more #s when you wanted 1 or more which is ##* (or #+ if you were using an ERE).

In the separating spaces part *:

You're using <blank>* which matches zero or more #s when you wanted 1 or more which is <blank><blank>* (or <blank>+ if you were using an ERE).

In the last regexp segment \[\([^\]]*\)\].*:

You're using [^\]] and so escaping ] which is undefined behavior per POSIX when you wanted just [^]].
You're using \] at the end which is undefined behavior per POSIX since there's no unescaped [ before it when you wanted just ].

If you fixed all of those issues you'd get:

$ sed 's/^\(##*\)  *\[\([^]]*\)].*/\1 \2/' file
## Some title

and since you're using GNU sed which supports EREs you could write that as:

$ sed -E 's/^(#+) +\[([^]]*)].*/\1 \2/' file
## Some title

And then to leave just the title as you said you wanted just means removing the first capture group:

$ sed 's/^##*  *\[\([^]]*\)].*/\1/' file
Some title

$ sed -E 's/^#+ +\[([^]]*)].*/\1/' file
Some title

Based on what you've told us so far (I'm trying to ... leave just the title:) and the sample input you provided (## [Some title](#some-title)) this might be what you're trying to do, using any awk:

$ awk -F'[][]' '{print $2}' file
Some title

or any sed:

$ sed 's/.*\[\([^]]*\)].*/\1/' file
Some title

but without more truly representative sample input and expected output that's just a guess.

As for what's wrong with your sed script:

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

In the first regexp segment \(\#*\):

You're escaping the literal char # as \# which is undefined behavior per POSIX when you wanted just #.
You're using #* which matches zero or more #s when you wanted 1 or more which is ##* or #\{1,\} in a BRE as sed uses by default (or #+ if you were using an ERE).

In the separating spaces part *:

You're using <blank>* which matches zero or more #s when you wanted 1 or more which is <blank><blank>* or <blank>\{1,\}` in a BRE (or +``` if you were using an ERE).

In the last regexp segment \[\([^\]]*\)\].*:

You're using [^\]] and so escaping ] which is undefined behavior per POSIX when you wanted just [^]].
You're using \] at the end which is undefined behavior per POSIX since there's no unescaped [ before it when you wanted just ].

If you fixed all of those issues you'd get:

$ sed 's/^\(##*\)  *\[\([^]]*\)].*/\1 \2/' file
## Some title

or

$ sed 's/^\(#\{1,\}\) \{1,\}\[\([^]]*\)].*/\1 \2/' file
## Some title

and since you're using GNU sed which supports EREs you could write that as:

$ sed -E 's/^(#+) +\[([^]]*)].*/\1 \2/' file
## Some title

And then to leave just the title as you said you wanted just means removing the first capture group:

$ sed 's/^##*  *\[\([^]]*\)].*/\1/' file
Some title

$ sed 's/^#\{1,\} \{1,\}\[\([^]]*\)].*/\1/' file
Some title

$ sed -E 's/^#+ +\[([^]]*)].*/\1/' file
Some title

deleted 3 characters in body

Source Link

edited Jul 18 at 12:27

Ed Morton

35.8k
6
25
60

Based on what you've told us so far this might be what you're trying to do, using any awk:

$ awk -F'[][]' '{print $2}' file
Some title

but without more truly representative sample input and expected output that's just a guess.

As for what's wrong with your sed script:

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

The effect of escaping literal characters is undefined by POSIX and you're escaping several of them - \#, \], and \).

In the first regexp segment \(\#*\):

You're escaping the literal char # as \# which is undefined behavior per POSIX when you wanted just #.
You're using #* which matches zero or more #s when you wanted 1 or more which is ##* (or #+ if you were using an ERE).

In the separating spaces part *:

You're using <blank>* which matches zero or more #s when you wanted 1 or more which is <blank><blank>* (or <blank>+ if you were using an ERE).

In the secondlast regexp segment \[\([^\]]*\)\].*/\1 \2/:

You're using [^\]] and so escaping ] which is undefined behavior per POSIX when you wanted just [^]].
You're using \] at the end which is undefined behavior per POSIX since there's no unescaped [ before it when you wanted just ].

If you fixed all of those issues you'd get:

$ sed 's/^\(##*\)  *\[\([^]]*\)].*/\1 \2/' file
## Some title

and since you're using GNU sed which supports EREs you could write that as:

$ sed -E 's/^(#+) +\[([^]]*)].*/\1 \2/' file
## Some title

And then to leave just the title as you said you wanted you just meanmeans removing the first capture group:

$ sed 's/^##*  *\[\([^]]*\)].*/\1/' file
Some title

$ sed -E 's/^#+ +\[([^]]*)].*/\1/' file
Some title

Based on what you've told us so far this might be what you're trying to do, using any awk:

$ awk -F'[][]' '{print $2}' file
Some title

but without more truly representative sample input and expected output that's just a guess.

As for what's wrong with your sed script:

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

The effect of escaping literal characters is undefined by POSIX and you're escaping several of them - \#, \], and \).

In the first segment \(\#*\):

You're escaping the literal char # as \# which is undefined behavior per POSIX when you wanted just #.
You're using #* which matches zero or more #s when you wanted 1 or more which is ##* (or #+ if you were using an ERE).

In the separating spaces part *:

You're using <blank>* which matches zero or more #s when you wanted 1 or more which is <blank><blank>* (or <blank>+ if you were using an ERE).

In the second segment \[\([^\]]*\)\].*/\1 \2/:

You're using [^\]] and so escaping ] which is undefined behavior per POSIX when you wanted just [^]].
You're using \] at the end which is undefined behavior per POSIX since there's no unescaped [ before it when you wanted just ].

If you fixed all of those issues you'd get:

$ sed 's/^\(##*\)  *\[\([^]]*\)].*/\1 \2/' file
## Some title

and since you're using GNU sed which supports EREs you could write that as:

$ sed -E 's/^(#+) +\[([^]]*)].*/\1 \2/' file
## Some title

And then to leave just the title as you said you wanted you just mean removing the first capture group:

$ sed 's/^##*  *\[\([^]]*\)].*/\1/' file
Some title

$ sed -E 's/^#+ +\[([^]]*)].*/\1/' file
Some title

Based on what you've told us so far this might be what you're trying to do, using any awk:

$ awk -F'[][]' '{print $2}' file
Some title

but without more truly representative sample input and expected output that's just a guess.

As for what's wrong with your sed script:

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

In the first regexp segment \(\#*\):

You're escaping the literal char # as \# which is undefined behavior per POSIX when you wanted just #.
You're using #* which matches zero or more #s when you wanted 1 or more which is ##* (or #+ if you were using an ERE).

In the separating spaces part *:

You're using <blank>* which matches zero or more #s when you wanted 1 or more which is <blank><blank>* (or <blank>+ if you were using an ERE).

In the last regexp segment \[\([^\]]*\)\].*:

You're using [^\]] and so escaping ] which is undefined behavior per POSIX when you wanted just [^]].
You're using \] at the end which is undefined behavior per POSIX since there's no unescaped [ before it when you wanted just ].

If you fixed all of those issues you'd get:

$ sed 's/^\(##*\)  *\[\([^]]*\)].*/\1 \2/' file
## Some title

and since you're using GNU sed which supports EREs you could write that as:

$ sed -E 's/^(#+) +\[([^]]*)].*/\1 \2/' file
## Some title

And then to leave just the title as you said you wanted just means removing the first capture group:

$ sed 's/^##*  *\[\([^]]*\)].*/\1/' file
Some title

$ sed -E 's/^#+ +\[([^]]*)].*/\1/' file
Some title

added 1321 characters in body

Source Link

edited Jul 18 at 12:20

Ed Morton

35.8k
6
25
60

Loading

Source Link

answered Jul 18 at 11:46

Ed Morton

35.8k
6
25
60

Loading

Stack Exchange Network

Return to Answer