Revisions to sed with numerical quantifiers - how to?

added 65 characters in body

Source Link

edited Oct 2, 2020 at 12:43

355.8k
42
735
1.1k

Your sed substitutions will not work as expected because you'll never be able to match a newline in the input data. This is because sed reads your file line by line, i.e. with the newlines as delimiters, and the expression(s) are applied to the lines individually, without the delimiting newlines.

Instead, changing your code slightly:

for fasta in ./*.fa; do
    sed 's;^\(>.*\)/.\{0,10\}$;\1;' "$fasta"
done

The few changes I've done are:

Use ; as the delimiter for the s/// command instead of the default /. This allows us to not escape the / in the pattern. Almost any character may be used as the delimiter, but one should probably pick one that does not occur in the pattern or in the replacement text.
Use only the standard basic regular expression syntax. In your pattern, (...) is extended regular expression syntax and \{...\} is basic regular expression syntax. I settled on using the basic syntax for portability. This also means dropping the -r option which enables the extended syntax in GNU sed.
Anchor the pattern to the start and end of the line with ^ and $ respectively.
Don't try to insert a newline in the replacement bit.

An alternative and shorter sed expression would be

sed '/^>/s;/.\{0,10\}$;;'

This applies a substitution to all lines that start with the > character (/^>/ acts as the "address" for the subsequent s/// command). The substitution simply deletes the / and the bit after it to the end of the line if that bit is 10 characters or fewer long.

Your sed substitutions will not work as expected because you'll never be able to match a newline in the input data. This is because sed reads your file line by line, i.e. with the newlines as delimiters, and the expression(s) are applied to the lines individually, without the delimiting newlines.

Instead, changing your code slightly:

for fasta in ./*.fa; do
    sed 's;^\(>.*\)/.\{0,10\}$;\1;' "$fasta"
done

The few changes I've done are:

Use ; as the delimiter for the s/// command instead of the default /. This allows us to not escape the / in the pattern. Almost any character may be used as the delimiter, but one should probably pick one that does not occur in the pattern or in the replacement text.
Use only the standard basic regular expression syntax. In your pattern, (...) is extended regular expression syntax and \{...\} is basic regular expression syntax. I settled on using the basic syntax for portability. This also means dropping the -r option which enables the extended syntax in GNU sed.
Anchor the pattern to the start and end of the line with ^ and $ respectively.
Don't try to insert a newline in the replacement bit.

An alternative and shorter sed expression would be

sed '/^>/s;/.\{0,10\}$;;'

This applies a substitution to all lines that start with the > character. The substitution simply deletes the / and the bit after it to the end of the line if that bit is 10 characters or fewer long.

Your sed substitutions will not work as expected because you'll never be able to match a newline in the input data. This is because sed reads your file line by line, i.e. with the newlines as delimiters, and the expression(s) are applied to the lines individually, without the delimiting newlines.

Instead, changing your code slightly:

for fasta in ./*.fa; do
    sed 's;^\(>.*\)/.\{0,10\}$;\1;' "$fasta"
done

The few changes I've done are:

Use ; as the delimiter for the s/// command instead of the default /. This allows us to not escape the / in the pattern. Almost any character may be used as the delimiter, but one should probably pick one that does not occur in the pattern or in the replacement text.
Use only the standard basic regular expression syntax. In your pattern, (...) is extended regular expression syntax and \{...\} is basic regular expression syntax. I settled on using the basic syntax for portability. This also means dropping the -r option which enables the extended syntax in GNU sed.
Anchor the pattern to the start and end of the line with ^ and $ respectively.
Don't try to insert a newline in the replacement bit.

An alternative and shorter sed expression would be

sed '/^>/s;/.\{0,10\}$;;'

This applies a substitution to all lines that start with the > character (/^>/ acts as the "address" for the subsequent s/// command). The substitution simply deletes the / and the bit after it to the end of the line if that bit is 10 characters or fewer long.

deleted 1 character in body

Source Link

edited Oct 2, 2020 at 12:37

Kusalananda ♦

355.8k
42
735
1.1k

Your sed substitutions will not work as expected because you'll never be able to match a newline in the input data. This is because sed reads your file line by line, i.e. with the newlines as delimiters, and the expressionsexpression(s) are applied to the lines individually, without the delimiting newlines.

Instead, changing your code slightly:

for fasta in ./*.fa; do
    sed 's;^\(>.*\)/.\{0,10\}$;\1;' "$fasta"
done

The few changes I've done are:

Use ; as the delimiter for the s/// command instead of the default /. This allows us to not escape the / in the pattern. Almost any character may be used as the delimiter, but one should probably pick one that does not occur in the pattern or in the replacement text.
Use only the standard basic regular expression syntax. In your pattern, (...) is extended regular expression syntax and \{...\} is basic regular expression syntax. I settled on using the basic syntax for portability. This also means dropping the -r option which enables the extended syntax in GNU sed.
Anchor the pattern to the start and end of the line with ^ and $ respectively.
Don't try to insert a newline in the replacement bit.

An alternative and shorter sed expression would be

sed '/^>/s;/.\{0,10\}$;;'

This applies a substitution to all lines that start with the > character. The substitution simply deletes the / and the bit after it to the end of the line if that bit is less than 10 characters or fewer long.

Your sed substitutions will not work as expected because you'll never be able to match a newline in the input data. This is because sed reads your file line by line, i.e. with the newlines as delimiters, and the expressions are applied to the lines individually, without the delimiting newlines.

Instead, changing your code slightly:

for fasta in ./*.fa; do
    sed 's;^\(>.*\)/.\{0,10\}$;\1;' "$fasta"
done

The few changes I've done are:

Use ; as the delimiter for the s/// command instead of the default /. This allows us to not escape the / in the pattern.
Use only the standard basic regular expression syntax. In your pattern, (...) is extended regular expression syntax and \{...\} is basic regular expression syntax. I settled on using the basic syntax for portability.
Anchor the pattern to the start and end of the line with ^ and $ respectively.
Don't try to insert a newline in the replacement bit.

An alternative and shorter sed expression would be

sed '/^>/s;/.\{0,10\}$;;'

This applies a substitution to all lines that start with the > character. The substitution simply deletes the / and the bit after it to the end of the line if that bit is less than 10 characters long.

Your sed substitutions will not work as expected because you'll never be able to match a newline in the input data. This is because sed reads your file line by line, i.e. with the newlines as delimiters, and the expression(s) are applied to the lines individually, without the delimiting newlines.

Instead, changing your code slightly:

for fasta in ./*.fa; do
    sed 's;^\(>.*\)/.\{0,10\}$;\1;' "$fasta"
done

The few changes I've done are:

Use ; as the delimiter for the s/// command instead of the default /. This allows us to not escape the / in the pattern. Almost any character may be used as the delimiter, but one should probably pick one that does not occur in the pattern or in the replacement text.
Use only the standard basic regular expression syntax. In your pattern, (...) is extended regular expression syntax and \{...\} is basic regular expression syntax. I settled on using the basic syntax for portability. This also means dropping the -r option which enables the extended syntax in GNU sed.
Anchor the pattern to the start and end of the line with ^ and $ respectively.
Don't try to insert a newline in the replacement bit.

An alternative and shorter sed expression would be

sed '/^>/s;/.\{0,10\}$;;'

This applies a substitution to all lines that start with the > character. The substitution simply deletes the / and the bit after it to the end of the line if that bit is 10 characters or fewer long.

Source Link

answered Oct 2, 2020 at 12:30

Kusalananda ♦

355.8k
42
735
1.1k

Your sed substitutions will not work as expected because you'll never be able to match a newline in the input data. This is because sed reads your file line by line, i.e. with the newlines as delimiters, and the expressions are applied to the lines individually, without the delimiting newlines.

Instead, changing your code slightly:

for fasta in ./*.fa; do
    sed 's;^\(>.*\)/.\{0,10\}$;\1;' "$fasta"
done

The few changes I've done are:

Use ; as the delimiter for the s/// command instead of the default /. This allows us to not escape the / in the pattern.
Use only the standard basic regular expression syntax. In your pattern, (...) is extended regular expression syntax and \{...\} is basic regular expression syntax. I settled on using the basic syntax for portability.
Anchor the pattern to the start and end of the line with ^ and $ respectively.
Don't try to insert a newline in the replacement bit.

An alternative and shorter sed expression would be

sed '/^>/s;/.\{0,10\}$;;'

This applies a substitution to all lines that start with the > character. The substitution simply deletes the / and the bit after it to the end of the line if that bit is less than 10 characters long.

Stack Exchange Network

Return to Answer