How can I delete html tags from a file using sed?

Question

I have a file that is mixed with both normal text I need and html-tags. I know that with REGEX it is possible to recognize html tags and with sed one can swap those for an empty string, but I do not know how to apply it concretely.

would it be possible to provide some samples? Are you looking to extract information from an XML file or cleanup some XML content within another file? Your choice of tool depends on what you are trying to achieve. — rahul
– rahul, Commented Feb 16, 2015 at 13:18

Sreeraj · Accepted Answer · 2015-02-16 13:38:20Z

7

If you are not insisting on sed, the best thing to do this would be lynx.

lynx --dump <filename>.html

This will output the content of the html file in the format the html code was intending to display. The only condition is that the filename should have a .html or .htm extension.

answered Feb 16, 2015 at 13:38

Sreeraj

5,17211 gold badges45 silver badges58 bronze badges

Add a comment |

unxnut · Accepted Answer · 2015-02-16 13:24:04Z

3

As long as your HTML tags are confined to a single line, the following will work:

sed 's/<[^>]*>//g'

answered Feb 16, 2015 at 13:24

unxnut

6,1242 gold badges22 silver badges28 bronze badges

8

What will happen with <tag attribute="legal use of >" foo=bar>? html.spec.whatwg.org/multipage/…

goldilocks
– goldilocks

2015-02-16 13:31:52 +00:00
Commented Feb 16, 2015 at 13:31
4

This will not handle comments correctly. Example: .

Thom Smith
– Thom Smith

2015-02-16 18:38:56 +00:00
Commented Feb 16, 2015 at 18:38

Add a comment |

zwol · Accepted Answer · 2015-02-16 15:58:09Z

3

I strongly recommend the use of either of the programs named html2text (1) (2) instead. Parsing HTML is much harder than it looks.

answered Feb 16, 2015 at 15:58

zwol

7,5092 gold badges21 silver badges33 bronze badges

Add a comment |

Stack Exchange Network

How can I delete html tags from a file using sed?

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How can I delete html tags from a file using sed?

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions