1

For getting the attribute value from the below mentioned xml for attribute code from tag c

random.xml

<a>
    <b>
        <c id="123" code="abc" date="12-12-2022"/>
        <c id="123" code="efg" date="12-12-2022"/>
        <c id="123" date="12-12-2022"/>
    </b>
</a>

Currently the logic is:

cat random.xml | egrep "<c.*/>" | awk -F1 ' /code=/ {f=NR} f&&NR-1==f' RS='"'

How does the above logic work to get the values of code from tag c?

Getting the expected output:

abc
efg
3
  • Just for clarification: So the question boils down that you have in awk a field containing a string of the form foo="bar", and you want to extract bar from it. Is this understanding correct? Commented Jan 10, 2023 at 12:00
  • Yes your understanding on the issue is correct @user1934428 Commented Jan 10, 2023 at 14:10
  • In this case, it is solely question about awk programming and the xml context is irrelevant. May I suggest that you edit the question then so that it reflects only the problem at hand, dropping the whole xml stuff. Having said this: As you can see from the awk man page, the functions split, or alternatively substr (perhaps combined with index) would be an obvious solution. Did you consider using one of them? Commented Jan 10, 2023 at 14:33

3 Answers 3

2

Firstly observe that

cat random.xml | egrep "<c.*/>" | awk -F1 ' /code=/ {f=NR} f&&NR-1==f'  RS='"'

is of dubious quality, as

  • egrep does not require standard input, it can read file itself, so you have useless use of cat
  • simple pattern is used in egrep which will work equally well in common grep, no need to summon ehanced grep, this usage is overkill
  • 1 is set as field separator in awk, but code does not make any use of fields mechanism

after fixing these issue code looks following way

grep "<c.*/>" random.xml | awk ' /code=/ {f=NR} f&&NR-1==f'  RS='"'

How it does work: select lines which contain <c followed by zero-or-more any characters followed by />, then instruct awk that row are separated by qoutes (") when row does contain code= set f variable value to number of row, print such row that f is set to non-zero value and f value is equal to current number of lines minus one, which does mean print rows which are directly after row containing code=.

Observe that GNU AWK is poorly suited for working with XML and using regular expression against XML is very poor idea, as XML is not Chomsky Type 3 contraption.

If possible use proper tools for working with XML data, e.g. hxselect might be used following way, let file.xml content be

<a>
    <b>
        <c id="123" code="abc" date="12-12-2022"/>
        <c id="123" code="efg" date="12-12-2022"/>
        <c id="123" date="12-12-2022"/>
    </b>
</a>

then

hxselect -c -s '\n' 'c[code]::attr(code)' < file.xml

gives output

abc
efg

Explanation: -c get just value rather than name and value, -s '\n' shear using newline, i.e. each value will be on own line c[code] is CSS3 selector meaning any c tag with attribute code, ::attr(code) is hxselect feature meaning get attribute named code. Observe that this solution is more robust than peculiar cat-egrep-awk pipeline as is immune to e.g. other whitespace usage in file (whitespaces outside tags in XML are optional).

Sign up to request clarification or add additional context in comments.

Comments

2

This might be an awk question but parsing XML should be done with XML tools.

Here's an example with Xidel (available here for a few OSs) and a standard XPath expression:

xidel --xpath '//c[@code]/@code' random.xml

note: //c[@code] selects the c nodes that have a code attribute, and .../@code outputs the value of the code attribute.

Output
abc
efg

Comments

0

If your input always looks likes the sample XML then you can make the code attribute itself a field separator, and < the record separator, so that you can easily extract the value as the second field when the first field is the tag name c:

awk -F' .*code="|" ' -vRS='<' '$1=="c"{print $2}'

Demo: https://awk.js.org/?snippet=Lz6yx7

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.