1

I need to fetch data for 2 tags "estimated" and "fullSign" for all the occurences on this result set.

RESULT SET:

<?xml version="1.0" encoding="UTF-8"?>
<resultSet xmlns="urn:trimet:arrivals" queryTime="1469138325745"><location desc="Morrison/SW 3rd Ave MAX Station" dir="Westbound" lat="45.5181811277907" lng="-122.675385866199" locid="8381" /><arrival block="9007" departed="true" dir="1" status="estimated" estimated="1469138452000" fullSign="MAX  Blue Line to Hillsboro" piece="1" route="100" scheduled="1469138250000" shortSign="Blue to Hillsboro" locid="8381" detour="false"><blockPosition feet="1901" at="1469138300978" heading="201" lat="45.5214364" lng="-122.6716177"><trip desc="Hatfield Government Center" dir="1" route="100" tripNum="6557314" destDist="77046" pattern="54" progress="75145" /></blockPosition></arrival><arrival block="9050" departed="true" dir="1" status="estimated" estimated="1469138664000" fullSign="MAX  Red Line to City Center &amp; Beaverton" piece="1" route="90" scheduled="1469138670000" shortSign="Red Line to Beaverton" locid="8381" detour="false"><blockPosition feet="4552" at="1469138313683" heading="237" lat="45.5277621" lng="-122.6687878"><trip desc="Beaverton TC Pocket" dir="1" route="90" tripNum="6556307" destDist="66321" pattern="15" progress="61769" /></blockPosition></arrival><arrival block="9018" departed="true" dir="1" status="estimated" estimated="1469139140000" fullSign="MAX  Blue Line to Hillsboro" piece="1" route="100" scheduled="1469139150000" shortSign="Blue to Hillsboro" locid="8381" detour="false"><blockPosition feet="13687" at="1469138320005" heading="239" lat="45.5309688" lng="-122.6350333"><trip desc="Hatfield Government Center" dir="1" route="100" tripNum="6557315" destDist="77046" pattern="54" progress="63359" /></blockPosition></arrival><arrival block="9043" departed="true" dir="1" status="estimated" estimated="1469139577000" fullSign="MAX  Red Line to City Center &amp; Beaverton" piece="1" route="90" scheduled="1469139570000" shortSign="Red Line to Beaverton" locid="8381" detour="false"><blockPosition feet="31909" at="1469138310486" heading="285" lat="45.5320383" lng="-122.5738342"><trip desc="Beaverton TC Pocket" dir="1" route="90" tripNum="6556308" destDist="66321" pattern="15" progress="34412" /></blockPosition></arrival></resultSet>

expected result:

1469138452000 MAX  Blue Line to Hillsboro
1469138664000 MAX  Red Line    to City Center &amp; Beaverton 
1469139140000 MAX  Blue Line  to    Hillsboro 
1469139577000 MAX  Red Line to City Center &amp;Beaverton

What is a good way for me to extract this data?

5
  • 3
    Start by searching for xmlstarlet here on U&L Commented Jul 21, 2016 at 23:38
  • Thanks @Roaima, Tried using xmlstarlet, but maybe regex is incorrect, still unable to fetch the estimated and fullSign values. '/usr/bin/xmlstarlet sel -t -v "/arrival/@estimated" -nl filename.xml' Commented Jul 22, 2016 at 18:22
  • 1
    when you post XML, please try to make it readable with xmltidy or xml_pp or xmlstarlet fo or one of many other similar tools. Commented Jul 23, 2016 at 6:41
  • for a simple extraction, I'd use xml2 to convert to a line-oriented format so I could use awk or perl or other standard line-oriented text utilities. Commented Jul 23, 2016 at 6:42
  • Sure, will follow from next time Commented Jul 25, 2016 at 20:56

2 Answers 2

2

This is using XMLstarlet with paste. It can probably be made in a single call to XMLstarlet, but I'm no wizard:

$ paste <(xml sel -T -t -v '//@estimated' data.xml) \
        <(xml sel -T -t -v '//@fullSign' data.xml)
1469138452000   MAX Blue Line to Hillsboro
1469138664000   MAX Red Line to City Center & Beaverton
1469139140000   MAX Blue Line to Hillsboro
1469139577000   MAX Red Line to City Center & Beaverton
0
1
$ xml2 < sunnx.xml | awk -F= '
   $1 ~ /@fullSign/  { fs=$2 ; sub(/&/,"&amp;",fs) };
   $1 ~ /@estimated/ { est=$2 };
   fs && est         { printf "%s %s\n", est, fs; fs=est="" }'
1469138452000 MAX  Blue Line to Hillsboro
1469138664000 MAX  Red Line to City Center &amp; Beaverton
1469139140000 MAX  Blue Line to Hillsboro
1469139577000 MAX  Red Line to City Center &amp; Beaverton

If you want a literal & rather than &amp;, then get rid of the sub() function call. xml2 decodes the encoded entities for you, so I added the sub() to change it back to conform to your requested output.

Without the sub(), the output looks like this:

1469138452000 MAX  Blue Line to Hillsboro
1469138664000 MAX  Red Line to City Center & Beaverton
1469139140000 MAX  Blue Line to Hillsboro
1469139577000 MAX  Red Line to City Center & Beaverton
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.