Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

4
  • 1
    I wouldn't try to use space/non-space as the determiner here (what happens when you get to plan</span> for example, which is all non-space?). Instead, consider using lookarounds ex. grep -oP '(?<=<span>).*(?=</span>)' or even just grep -oP '(?<=>).*(?=<)' Commented Feb 21, 2021 at 13:00
  • 1
    Are the tags always <span> </span> or are they ever something else? Commented Feb 21, 2021 at 13:33
  • @NasirRiley, this is actually from HTML file, so there are tons of HTML tags in it. The text that I'm looking for always started with Section \d: Commented Feb 21, 2021 at 14:40
  • For grep, it does not matter if there are html tags if you search for strings like Section 1:. Regarding html tags I would use some of programs for stripping html tags (w3m, html2text), after grep has found the text. You could also first strip html tags and then search for your strings. Commented Feb 21, 2021 at 16:01