Parse XML into HTML using Java org.w3c.dom

Question

Through an API I get an XML file which I'm trying to parse through org.w3c.dom and XPath. A part of the XML file describes HTML content:

<Para>Since 2001, state and local health departments in the US have accelerated efforts to prepare for bioterrorism and other high-impact public health emergencies. These activities have been spurred by federal funding and guidance from the US Centers for Disease Control and Prevention (CDC) and the Health Resources and Services Administration (HRSA) 
     <CitationRef CitationID="B1">1</CitationRef>  
     <CitationRef CitationID="B2">2</CitationRef> . Over time, the emphasis of this guidance has expanded from bioterrorism to include "terrorism and non-terrorism events, including infectious disease, environmental and occupational related emergencies" 
     <CitationRef CitationID="B4">4</CitationRef> as well as pandemic influenza.
</Para>

This should become something like:

<p>Since 2001, state and local health departments in the US have accelerated efforts to prepare for bioterrorism and other high-impact public health emergencies. These activities have been spurred by federal funding and guidance from the US Centers for Disease Control and Prevention (CDC) and the Health Resources and Services Administration (HRSA) 
     <a href="link/B1">1</a>  
     <a href="link/B2">3</a> . Over time, the emphasis of this guidance has expanded from bioterrorism to include "terrorism and non-terrorism events, including infectious disease, environmental and occupational related emergencies" 
     <a href="link/B4">4</a> as well as pandemic influenza.
</p>

Any suggestions on how I can accomplish this? The main issue is to retrieve the tags and replace them while keeping their location.

This sounds like a perfect job for XSLT as it is a language to transform XML input into some other XML format or into HTML. If you need help with the XSLT code then add the XSLT tag to your question. — Martin Honnen
– Martin Honnen, Commented Apr 13, 2012 at 9:35

Martin Honnen · Accepted Answer · 2012-04-13 10:05:52Z

1

Here is how you could do that with XSLT:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="Para">
  <p>
    <xsl:apply-templates select="@* | node()"/>
  </p>
</xsl:template>

<xsl:template match="CitationRef[@CitationID]">
  <a href="link/{@CitationID}">
    <xsl:apply-templates/>
  </a>
</xsl:template>

</xsl:stylesheet>

answered Apr 13, 2012 at 10:05

Martin Honnen

169k6 gold badges100 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user485659 Over a year ago

Thanks for your response, I'm looking into XSLT (rgagnon.com/javadetails/java-0407.html), is there a way how I can have the XSL file you provided, the XML which needs to be parsed and the output all to be a string (so not files)?

Martin Honnen Over a year ago

I am pretty sure having input, stylesheet and result as a string is possible with JAXP, it is just a question of using the right source docs.oracle.com/javase/6/docs/api/javax/xml/transform/stream/… and result types (e.g. StreamSource over StringReader). I will leave that to people more familiar with Java APIs than I am.

user485659 Over a year ago

Thanks for your tips, I got it working! For the input XML I used following code: nl = (Node) xpath.evaluate("//expression/here",doc, XPathConstants.NODE); DOMSource source = new DOMSource(nl);

Collectives™ on Stack Overflow

Parse XML into HTML using Java org.w3c.dom

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related