Skip to main content
Post Undeleted by Stéphane Chazelas
added 280 characters in body
Source Link
Stéphane Chazelas
  • 585.1k
  • 96
  • 1.1k
  • 1.7k

If the file were correct xml, you could use an xml parsing tool.

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

Here, you could try something like:

awk -vRS='<' '
  inside || /^div[^>]*id="id1"/ {
    inside = 1
    if (/^div/)
      n++
    else if (/^\/div>/ && !--n) {
      $0="/div>\n"
      inside=0
    }
    printf "<%s", $0
  }' the-file.html

If the file were correct xml, you could use an xml parsing tool.

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

If the file were correct xml, you could use an xml parsing tool.

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

Here, you could try something like:

awk -vRS='<' '
  inside || /^div[^>]*id="id1"/ {
    inside = 1
    if (/^div/)
      n++
    else if (/^\/div>/ && !--n) {
      $0="/div>\n"
      inside=0
    }
    printf "<%s", $0
  }' the-file.html
Post Deleted by Stéphane Chazelas
added 7 characters in body
Source Link
Stéphane Chazelas
  • 585.1k
  • 96
  • 1.1k
  • 1.7k

If the file were correct xml, you could use an xml parsing tool. 

Otherwise, assuming there'sif there were no other (nested) div section inside that section, you could dohave done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

If the file were correct xml, you could use an xml parsing tool. Otherwise, assuming there's no other (nested) div section inside that section, you could do:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html

If the file were correct xml, you could use an xml parsing tool. 

Otherwise, if there were no other (nested) div section inside that section, you could have done:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html
Source Link
Stéphane Chazelas
  • 585.1k
  • 96
  • 1.1k
  • 1.7k

If the file were correct xml, you could use an xml parsing tool. Otherwise, assuming there's no other (nested) div section inside that section, you could do:

pcregrep -Mo '(?s)<div[^>]*id="id1".*?</div>' the-file.html