Revisions to grep to extract a substring from a huge string

Post Undeleted by Stéphane Chazelas

occurred Feb 8, 2014 at 12:39

added 280 characters in body

Source Link

edited Feb 8, 2014 at 12:39

Stéphane Chazelas

585.1k
96
1.1k
1.7k

If the file were correct xml, you could use an xml parsing tool.

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

Here, you could try something like:

awk -vRS='<' '
  inside || /^div[^>]*id="id1"/ {
    inside = 1
    if (/^div/)
      n++
    else if (/^\/div>/ && !--n) {
      $0="/div>\n"
      inside=0
    }
    printf "<%s", $0
  }' the-file.html

If the file were correct xml, you could use an xml parsing tool.

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

If the file were correct xml, you could use an xml parsing tool.

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

Here, you could try something like:

awk -vRS='<' '
  inside || /^div[^>]*id="id1"/ {
    inside = 1
    if (/^div/)
      n++
    else if (/^\/div>/ && !--n) {
      $0="/div>\n"
      inside=0
    }
    printf "<%s", $0
  }' the-file.html

Post Deleted by Stéphane Chazelas

occurred Feb 8, 2014 at 12:10

added 7 characters in body

Source Link

edited Feb 8, 2014 at 11:53

Stéphane Chazelas

585.1k
96
1.1k
1.7k

If the file were correct xml, you could use an xml parsing tool.

Otherwise, assuming there'sif there were no other (nested) div section inside that section, you could dohave done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

If the file were correct xml, you could use an xml parsing tool. Otherwise, assuming there's no other (nested) div section inside that section, you could do:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

If the file were correct xml, you could use an xml parsing tool.

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

Source Link

answered Feb 8, 2014 at 11:45

Stéphane Chazelas

585.1k
96
1.1k
1.7k

If the file were correct xml, you could use an xml parsing tool. Otherwise, assuming there's no other (nested) div section inside that section, you could do:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

Stack Exchange Network

Return to Answer