Timeline for Trying to grep url from html source in .txt file using sed
Current License: CC BY-SA 3.0
6 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Oct 9, 2015 at 12:37 | vote | accept | Lewandajo | ||
| Oct 9, 2015 at 12:37 | |||||
| Oct 9, 2015 at 12:33 | vote | accept | Lewandajo | ||
| Oct 9, 2015 at 12:33 | |||||
| Oct 9, 2015 at 12:30 | vote | accept | Lewandajo | ||
| Oct 9, 2015 at 12:31 | |||||
| Oct 9, 2015 at 9:23 | comment | added | Lewandajo | Thanks @MirosławZalewski As you said, i am having a problem with shell toolset, a better way of handling html seems to be DOMXPath. | |
| Oct 8, 2015 at 18:43 | comment | added | Mirek Długosz |
Using XPath is the only correct approach, but I'm afraid that requirement of "fixing" HTML content will not cut it in. Adding <html> and </html> around content provided by OP is doable, but it's probably not source of entire page. So you have to first extract that <div>. You probably can't use xmlstarlet, because some other parts might require "fixing". On the other hand, simple grep will not handle variable number of lines that <div> might span. This is kind of chicken-and-egg problem and it shows that shell toolset is not very suitable for handling HTML content.
|
|
| Oct 8, 2015 at 18:10 | history | answered | Cyrus | CC BY-SA 3.0 |