0

I want To delete every thing but a message. For example, if we have the following:

<p class="TweetTextSize  js-tweet-text tweet-text" lang="en" data-aria-label-part="0">.<a href="/TuckerCarlson" class="twitter-atreply pretty-link js-nav" dir="ltr" data-mentioned-user-id="22703645" ><s>@</s><b>TuckerCarlson</b></a>: &quot;Massive demographic change has political consequences.&quot; <a href="/hashtag/Tucker?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" ><s>#</s><b>Tucker</b></a><a href="https://t.co/PKqNgaihMQ" class="twitter-timeline-link u-hidden" data-pre-embedded="true" dir="ltr" >pic.twitter.com/PKqNgaihMQ</a></p>

The result after using the command should look like this:

Massive demographic change has political consequences.

My attempt so far

sed -n "/<p class="TweetTextSize  js-tweet-text tweet-text" lang="en" data-aria-label-part="0">/,/<\/p>/p">>

What I am trying to do is to delete what is inside all <> </> pattern between <p> </p> and keep the rest. I know it does not seem easy but I would still appreciate any help.

2
  • 4
    Use an XML parser. Commented Apr 18, 2017 at 2:37
  • While in certain case you can get sed or awk to do XML, they are usually not the best tools for the job. As @Wildcard said get a proper XML parser. I personally would use python, but that is just me. See posts like : unix.stackexchange.com/questions/295896/… for other suggestions. Commented Apr 18, 2017 at 2:43

1 Answer 1

4

The solution using xmstarlet tool:

xmlstarlet sel -t -v "/p/text()[2]" -n file | sed 's/.*"\(.*\)"/\1/'

The output:

Massive demographic change has political consequences.

sel option

sel (or select) - Select data or query XML document(s) (XPATH, etc)

-t --template options

-v option

-v or --value-of - print value of XPATH expression


/p/text()[2] - XPath expresssion, selects the second text node of the paragraph(the first text node is .)

sed 's/.*"\(.*\)"/\1/' - to extract the message between double quotes

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.