0

I have an issue I am trying to solve with sed. My goal is to quote a the content after content= if the content is not already quoted.

Here is the concrete example:

<meta name="ProgId" content=Word.Document>
<meta name="Generator" content="Microsoft Word 15">

I would like to add quotes around Word.Document so at the end have:

<meta name="ProgId" content="Word.Document">
<meta name="Generator" content="Microsoft Word 15">

I was trying with

sed -i 's@content="\(.*\)"@content="\1"/@g' "$1"

However this is not working.

Thank you.

1
  • content=" There is no ", so why match it? Commented Jul 9, 2020 at 8:44

2 Answers 2

1

There is no " in the input behind content=, so you shouldn't match it. You could match up until a space or >.

sed 's@content=\([^"][^ >]*\)@content="\1"@'

Note that you should use XML aware tools to parse XML documents.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! This works partly. In the case the words after the content= are quoted, the sed will add another pair of quotes and this is not the goal. Sorry I didn't mention this in my question. I will rephrase.
Well, then match not a quote. Fixed.
@Miloš you should be able to modify the suggested answer for that case.. sed 's@content=\([^" >]*\)>@content="\1">@' is one way to do it.. btw, there is an extra / left over before last @
1

This should work:

sed -E 's/content=([^">]+)/content="\1"/'

Explanation:

In this way, you tell sed to substitute everything is after content= and before > only if it doesn't start with ". I used regex grouping to replace the content with itself surrounded by ".

Input:

<meta name="ProgId" content=Word.Document>
<meta name="Generator" content="Microsoft Word 15">

Output:

<meta name="ProgId" content="Word.Document">
<meta name="Generator" content="Microsoft Word 15">

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.