0

There is a function Parse_xml as below

    Parse_XML()
{

    TDIR=$1
    _VERSION=
    _REVISION=
    _FILENAME=
    _COMPONENT=
    _DESCRIPT=
    _ISITOA=0
    _NOLOG=0
   _OAVERSION=

    local TMP=/tmp/tmpfile.txt-$$
    local JUNK

    # find the cpq_package XML file and assign it to file
    local file=
    for xmlfile in *.xml
    do
        if [ -n "$(head ${xmlfile} | grep '<cpq_package')" ] ; then
            file="${xmlfile}"
            break
        fi
    done


    if [ -z "${file}" ] || [ ! -f "${file}" ]
    then
        _NOLOG=1
        return
    fi

    ${echo} `grep \<version $file|awk -F = '{print $2}'|awk '{print $1}'|tr -d '"'` > $TMP
    read _VERSION JUNK < $TMP
    ${echo} `grep \<version $file|awk -F '=' '{print $3}'|awk '{print $1}'|tr -d '"'` > $TMP
    read _REVISION JUNK < $TMP

    _OAVERSION=${_VERSION}
    _VERSION=${_VERSION}${_REVISION}

here the version and revisions fetched from xml file from this line

<version value="GPK5" revision="B" type_of_change="1"/>
<version value="GPK5" revision="" type_of_change="1"/>

here some of the revision are empty string and some are having 1 character so the command

 grep \<version CP057761.xml|awk -F = '{print $2}'|awk '{print $1}'|tr -d '"'

is fetching all the version from xml and store in TMP file. And command

grep \<version CP057761.xml|awk -F '=' '{print $3}'|awk '{print $1}'|tr -d '"'

is fetching revisions of all the version headers from xml with different versions.

so sometimes the revision of previous version if fetched and added to a version which has empty revision.

How I can modify this command

    ${echo} `grep \<version $file|awk -F = '{print $2}'|awk '{print $1}'|tr -d '"'` > $TMP
    read _VERSION JUNK < $TMP
    ${echo} `grep \<version $file|awk -F '=' '{print $3}'|awk '{print $1}'|tr -d '"'` > $TMP
    read _REVISION JUNK < $TMP

    _OAVERSION=${_VERSION}
    _VERSION=${_VERSION}${_REVISION}

to search only the value in _VERSION variable in xml file and fetch it's particular version. so when it has revision, the _VERSION prints GPK5B and when its empty, the _VERSION prints GPK5.

I fixed the issue by searching the $_VERSION in grep of revision instead \<version. it fetched me only revisions with that particular version and read _REVISION JUNK $TMP fetched me the revision So basically I wanted only latest revision along with version. I regret, I wasn't clear with my question before.

2

2 Answers 2

3

Use an XML parser to parse XML data. is one.

Given file.xml containing

<root>
<version value="GPK5" revision="B" type_of_change="1"/>
<version value="GPK5" revision="" type_of_change="1"/>
</root>

Then

xmlstarlet sel -t -m '//version' -v '@value' -v '@revision' -n file.xml

Outputs

GPK5B
GPK5
0
1

Don't use sed nor regex to parse HTML/XML you cannot, must not parse any structured text like XML/HTML with tools designed to process raw text lines. If you need to process XML/HTML, use an XML/HTML parser. A great majority of languages have built-in support for parsing XML and there are dedicated tools like xidel, xmlstarlet or xmllint if you need a quick shot from a command line shell.. Never accept a job if you don't have access to proper tools.

is the most advanced XML/HTML parser in command line out there.

His syntax is more intuitive than xmlstarlet and xmllint when you know query language:

xidel -e '//version/(@value||""||@revision)' -s file.xml
GPK5B
GPK5

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.