I have been working on some simple bash script recently, which parses specific data from webpages. I have used tr '\r\n' ' ' <file1.txt >file2.txt to make sure, all extracted data from page is stored in file1.txt in one row. So then I need to match all strings between <th>...</th> tags in this line and delete them or replace with ' ' sign.
So here is some expamle code:
<td>Abaktal hm</td> </tr> <tr> <th>Package</th> <td>flm 10x400 mg</td> <th>Indesit</th>
I have used sed and tried something like
sed -i 's/\<th\>.*?\<\/th\>/ /g' output.txt
But it didn't work. I think problem is in ? sign. It works with ? sign in regular expressions, but probably not in bash.
sedcommand to solve this problem? Just need to select all those strings at once.