0

Through an API I get an XML file which I'm trying to parse through org.w3c.dom and XPath. A part of the XML file describes HTML content:

<Para>Since 2001, state and local health departments in the US have accelerated efforts to prepare for bioterrorism and other high-impact public health emergencies. These activities have been spurred by federal funding and guidance from the US Centers for Disease Control and Prevention (CDC) and the Health Resources and Services Administration (HRSA) 
     <CitationRef CitationID="B1">1</CitationRef>  
     <CitationRef CitationID="B2">2</CitationRef> . Over time, the emphasis of this guidance has expanded from bioterrorism to include "terrorism and non-terrorism events, including infectious disease, environmental and occupational related emergencies" 
     <CitationRef CitationID="B4">4</CitationRef> as well as pandemic influenza.
</Para>

This should become something like:

<p>Since 2001, state and local health departments in the US have accelerated efforts to prepare for bioterrorism and other high-impact public health emergencies. These activities have been spurred by federal funding and guidance from the US Centers for Disease Control and Prevention (CDC) and the Health Resources and Services Administration (HRSA) 
     <a href="link/B1">1</a>  
     <a href="link/B2">3</a> . Over time, the emphasis of this guidance has expanded from bioterrorism to include "terrorism and non-terrorism events, including infectious disease, environmental and occupational related emergencies" 
     <a href="link/B4">4</a> as well as pandemic influenza.
</p>

Any suggestions on how I can accomplish this? The main issue is to retrieve the tags and replace them while keeping their location.

1
  • This sounds like a perfect job for XSLT as it is a language to transform XML input into some other XML format or into HTML. If you need help with the XSLT code then add the XSLT tag to your question. Commented Apr 13, 2012 at 9:35

1 Answer 1

1

Here is how you could do that with XSLT:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="Para">
  <p>
    <xsl:apply-templates select="@* | node()"/>
  </p>
</xsl:template>

<xsl:template match="CitationRef[@CitationID]">
  <a href="link/{@CitationID}">
    <xsl:apply-templates/>
  </a>
</xsl:template>

</xsl:stylesheet>
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your response, I'm looking into XSLT (rgagnon.com/javadetails/java-0407.html), is there a way how I can have the XSL file you provided, the XML which needs to be parsed and the output all to be a string (so not files)?
I am pretty sure having input, stylesheet and result as a string is possible with JAXP, it is just a question of using the right source docs.oracle.com/javase/6/docs/api/javax/xml/transform/stream/… and result types (e.g. StreamSource over StringReader). I will leave that to people more familiar with Java APIs than I am.
Thanks for your tips, I got it working! For the input XML I used following code: nl = (Node) xpath.evaluate("//expression/here",doc, XPathConstants.NODE); DOMSource source = new DOMSource(nl);

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.