I searched around and found these 2 topics, however they're different as the number of space is fixed while my sample doesn't have fixed space count.
https://stackoverflow.com/questions/47428445/i-want-grep-to-grep-one-word-which-is-having-spaces-it
https://askubuntu.com/questions/949326/how-to-include-a-space-character-with-grep
Sample text:
<span>Section 1: Plan your day, write out your plan</span>
Desired Output:
Section 1: Plan your day, write out your plan
I would like to grep text only, and not HTML tag. Here is my attempt.
wolf@linux:~$ cat file.txt
<span>Section 1: Plan your day, write out your plan</span>
wolf@linux:~$
wolf@linux:~$ grep -oP 'S\S+ \d: \S+' file.txt
Section 1: Plan
wolf@linux:~$
wolf@linux:~$ grep -oP 'S\S+ \d: \S+ \S+' file.txt
Section 1: Plan your
wolf@linux:~$
Is there better solution rather than defining \S+ one by one as the length of text is different?
plan</span>for example, which is all non-space?). Instead, consider using lookarounds ex.grep -oP '(?<=<span>).*(?=</span>)'or even justgrep -oP '(?<=>).*(?=<)'<span> </span>or are they ever something else?Section \d:Section 1:. Regarding html tags I would use some of programs for stripping html tags (w3m, html2text), after grep has found the text. You could also first strip html tags and then search for your strings.