0

I have an xml with below content and my question is how to extract Username, Password values from resource tag, here we need to exclude commented resource tag and fetch values from uncommented resource tag by using shell script. I tried but it was fetching values from latest tag. Can someone help me how to remove comments tags and fetch values from xml.

<?xml version='1.0' encoding='utf-8'?>
<!-- The contents of this file will be loaded for each web application -->
<!--
 <Resource name="jdbcSource" auth="Container"
type="javax.sql.DataSource"
 username="demo"
    password="test"
        driverClassName="driverclassname"
        url="driver@host"
    maxActive="20"
    maxIdle="10"
     />

-->

<Resource auth="Container"
driverClassName="driverclassname" maxActive="100" maxIdle="30" maxWait="10000"
name="jdbcSource" password="test" type="javax.sql.DataSource"
url="driver@host"
username="demo"/>

</Context>

4 Answers 4

2

Firstly my answer assumes that you have actual well formed source XML. The example code you've provided isn't XML as it doesn't have an opening root element, namely <Context> - but I'll assume there is one anyway.


Bash features by themselves are not very well suited parsing XML.

This Bash FAQ states the following:

Do not attempt [to extract data from an XML file] with , , , and so on (it leads to undesired results)

If you must use a shell script then utilize an XML specific command line tool, such as XMLStarlet (there are other similar tools available). See download info here - if you don't already have XML Starlet installed.

Solution:

Using XML Starlet you can run the following commands:

uname=$(xml sel -t -v "/Context/Resource/@username" path/to/file.xml)
pword=$(xml sel -t -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

Explanation

  • uname=$(...)

    Here we utilize Command substitution to assign the output of the XML Startlet command to a variable named uname (i.e. the username).

  • xml sel -t -v "/Context/Resource/@username"

    This command breaks down as follows:

    • xml - invoke the XML Starlet command.
    • sel - select data or query XML document(s).
    • -t - the template option.
    • -v - print the value of XPATH expression.
    • "/Context/Resource/@username" - the expression to select the value of the username attribute of the Resource tag/element.
  • path/to/file.xml

    This part should be replaced with the real path to your .xml file.

Likewise, we utilize a similar command for obtaining the value of the password attribute, whereby we assign the output of the command to a variable named pword, and change the XPATH expression.


Edit 1: A more efficient command

As per Charles Duffy's first comment below... you can also extract both attribute values more efficiently using the following command instead:

{ IFS= read -r uname && IFS= read -r pword; } < <(xml sel -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

The main benefit here is that the source XML file is only read once.


Edit 2: Using XML Starlet to generate an XSLT template that can then be run on any system with xsltproc, including hosts that don't have XML Starlet installed:

As per Charles Duffy's second comment below...

It's also possible to utilize XML Starlet to generate an template which is derived from the XML Starlet query shown previously. The .xsl file which is generated can then be run on any system which has available (including hosts that don't have XML Starlet installed).

The following steps demonstrate how to achieve this:

  1. Firstly run the following XML Starlet command to generate the .xsl file:

    xml sel -C -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml > path/to/resultant/my-template.xsl
    

    This command is very similar to the previously shown XML Starlet command. The notable differences are:

    • The additional -C option between sel and -t
    • The redirection operator > and a file path. This specifies the location at which to save the output, (i.e. the generated XSLT template/stylesheet).

      Note the path/to/resultant/my-template.xsl part should be changed as necessary.

    The contents of the generated XSLT stylesheet will be something like the following:

    my-template.xsl

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
      <xsl:output omit-xml-declaration="yes" indent="no"/>
      <xsl:template match="/">
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@username"/>
        </xsl:call-template>
        <xsl:value-of select="'&#10;'"/>
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@password"/>
        </xsl:call-template>
      </xsl:template>
      <xsl:template name="value-of-template">
        <xsl:param name="select"/>
        <xsl:value-of select="$select"/>
        <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
          <xsl:value-of select="'&#10;'"/>
          <xsl:value-of select="."/>
        </xsl:for-each>
      </xsl:template>
    </xsl:stylesheet>
    
  2. Next, run the following command which utilizes to transform the source .xml file. This ultimately assigns the result of the transformation to the two variables, i.e. uname and pword:

    { IFS= read -r uname && IFS= read -r pword; } < <(xsltproc path/to/resultant/my-template.xsl path/to/file.xml)
    
    echo "$uname $pword" # --> demo test
    

    Note the parts reading path/to/resultant/my-template.xsl and path/to/file.xml should be changed as necessary.


Sign up to request clarification or add additional context in comments.

4 Comments

You could extract both in just one run. { IFS= read -r uname && IFS= read -r pword; } < <(xmlstarlet ... -v foo -n -v bar -n) -- more efficient that way.
It might also be valuable to show how to tell XMLStarlet to generate an XSLT template that can then be run on any system with xsltproc, including hosts that don't have XMLStarlet installed.
@CharlesDuffy - Done... edits to my answer now demonstrate both suggestions mentioned in your comments - Thank you !
If I could give you a second +1 I would. :)
1

You can't parse XML with RegEx or native Bash tools. Please use a dedicated XML-parser like instead.

Assuming the opening root element <Context> is added to make it valid XML, you could of course manually set the variables by calling Xidel twice...

uname=$(xidel -s "input.xml" -e '//Resource/@username')
pword=$(xidel -s "input.xml" -e '//Resource/@password')

...but Xidel also has its own way to export (multiple) variables:

$ xidel -s "input.xml" \
  -e '//Resource/(uname:=@username,pword:=@password)' \
  --output-format=bash
uname='demo'
pword='test'

$ eval "$(
  xidel -s "input.xml" \
    -e '//Resource/(uname:=@username,pword:=@password)' \
    --output-format=bash
)"

$ printf '%s\n' "$uname" "$pword"
demo
test

Comments

0

with perl one liner

perl -n0777E '
    # remove comments
    s/<!--.*?-->//gs;

    # match username and password with lookaheads and display in custom way
    say "user:$1\tpass:$2" while /<Resource(?=[^>]*\susername="([^"]*)")(?=[^>]*\spassword="([^"]*)")[^>]*>/g
' < file.xml

5 Comments

Thanks, but i am looking only shell script not perl. Anyway i tried your code but it didn't work in my environment.
@Mahesh Define "shell script". Do you mean you are unwilling to use sed, awk, grep, and any other command that is not a shell builtin? That is overly restrictive, and IMO completely defeats the point of the shell. Using perl is perfectly valid in a shell script.
...to showcase some other concrete corner cases -- < Resource is valid XML, but won't be found by the code here. Moving the username onto a different line from the Resource is also valid, and I'm not sure that's honored here. And there are entities -- consider if someone has a password with a literal quote; it would become &quot; -- this and other entities would need to be decoded to parse the value robustly.
@CharlesDuffy, this is why it was using the lookaheads the order doesn't matter, and there is no pb with newlines because 0777 option and [^>] also matches newlines, the only issue to handle may be false positives in cdata which can be removed like comment: s/<\!\[CDATA\[.*?]]>//gs
Still got entity decoding in output as work that needs to be happen but which isn't currently implemented. (And to actually implement the full letter-of-the-standard, something would need to support entities added in the individual document's DTD).
0

i did as below:

Created yourxmlfile.xml

<Context>
    <Resource auth="Container"
    driverClassName="driverclassname" maxActive="100" maxIdle="30" maxWait="10000"
    name="jdbcSource" password="test" type="javax.sql.DataSource"
    url="driver@host"
    username="demo"/>
</Context>

sed -n 's/.[^ ]* password="([^"])./\1/p' yourxmlfile.xml

  test

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.