How extract string between start and end pattern with sed AWK?

Question

I have html file , I want extract string between pattern . this file look like this :

<span>aghahan.com</span>
<span>pouyamannequin.com</span>

i need that domain with span : aghahan.com , pouyamannequin.com

I am try with this command :

sed -e 's/>!\(.*\)>.com<\/span>/\1/' domain.txt

but I get wrong result . thankful if anybody help me.

See stackoverflow.com/a/1732454/1081936 and the xmllint and xmlstarlet answers at unix.stackexchange.com/q/83385/133219 — Ed Morton
– Ed Morton, Commented Mar 27, 2020 at 4:38

Nasir Riley · Accepted Answer · 2020-03-26 12:32:07Z

1

As each line begins with <span> and ends with </span>:

sed 's|<span>\(.*\)</span>|\1|' domain.txt

You can also do it this way with awk by setting the field separator as either < or > and printing the third column:

awk -F '[<>]' '{print $3}' domain.txt

Output:

aghahan.com
pouyamannequin.com

These are the simplest ways that it can be done and it will also work if the lines have trailing white space.

answered Mar 26, 2020 at 12:32

Nasir Riley

12.3k2 gold badges26 silver badges30 bronze badges

Add a comment |

zorbax · Accepted Answer · 2020-03-26 10:50:21Z

0

With sed

 sed 's/\(.*\)>\(.*\)<\(.*\)/\2/g' domain.txt

answered Mar 26, 2020 at 10:50

zorbax

3401 gold badge2 silver badges12 bronze badges

Add a comment |

pLumo · Accepted Answer · 2020-03-26 11:39:00Z

0

With python and BeautifulSoup:

python -c '
from bs4 import BeautifulSoup
f = open("domain.txt", "r")
soup = BeautifulSoup(f.read(),"html.parser")
for span in soup.find_all("span"):
  print(span.string)
'

Might be a bit overkill for your simple task, but will work much better and will be easier on more difficult tasks, e.g. if you have different html like:

<span>
 aghahan.com
</span>
<span>
 pouyamannequin.com
</span>

edited Mar 26, 2020 at 11:39

answered Mar 26, 2020 at 11:02

pLumo

23.2k2 gold badges43 silver badges70 bronze badges

Add a comment |

Praveen Kumar BS · Accepted Answer · 2020-03-26 20:02:03Z

0

awk -F ">" '{print $2}' filename| sed "s/<.*//g"

output

aghahan.com
pouyamannequin.com

answered Mar 26, 2020 at 20:02

Praveen Kumar BS

5,3112 gold badges11 silver badges16 bronze badges

Add a comment |

Stack Exchange Network

How extract string between start and end pattern with sed AWK?

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

How extract string between start and end pattern with sed AWK?

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions