1

I have a requirement to identify and write all xml file names which are empty to a text file for reporting purpose. Empty , here means the xml file has the usual header tag <?xml version="1.0" encoding="UTF-8"?> followed by an empty open and close tag.

Sample file: 1)

<?xml version="1.0" encoding="UTF-8"?>
<STBTests>
</STBTests>

2)

<?xml version="1.0" encoding="UTF-8"?>
<UMTTests>
</UMTTests>

There are no data in the xml files apart from this. Any suggestions on how to approach this would be great.

7
  • Can you be specific about how you define an "empty" file? Is it (at most) one tag on the first line, followed by pairs of open and close tags with nothing but whitespace between them? Commented Mar 15, 2018 at 20:56
  • How about file size, it could be a good tell if non-empty files are bigger, use find /path -size -128c to find files with less than 128 bytes Commented Mar 15, 2018 at 21:06
  • other option is to count the lines, for F in *.xml; do if [ $(wc -l "$F") -lt 4 ]; then echo "$F"; fi; done Commented Mar 15, 2018 at 21:11
  • 1
    @Dalvenjia: very bad idea. Think of an inline node like <foo>xxxxxxxxxxxxxxxxxxxxxxxx</foo> Commented Mar 15, 2018 at 21:44
  • What about comments and processing instructions? Commented Mar 15, 2018 at 23:07

2 Answers 2

2

Try this using with a expression :

#!/bin/sh

for xml in *.xml; do
    bool=$(xmllint --xpath 'count(//*)=1 and string-length(//*[1])=1' "$xml")
    if [ $bool = true ]; then
        echo "$xml" >> xml_list_files
    fi
done

cat xml_list_files

The expression test that the file have only one node without any text content. In this case, the command return true

5
  • This is what exactly i was looking for, works like a charm! Thanks a lot bud! Really appreciate the quick response. Commented Mar 16, 2018 at 0:19
  • Thanks. Don't forget to accept/vote up if it fit your needs :) Commented Mar 16, 2018 at 0:22
  • I had fun finding the solution, not that easy at a first glance Commented Mar 16, 2018 at 0:24
  • 1
    Awesome! I know !!! you are a genius sir! I have been braking my head around this with the very limited unix knowledge i got. Am glad i asked here. This the first time am on this website, and definitely not the last. Am gonna try and help others in need like this :) Cheers! Commented Mar 16, 2018 at 0:28
  • Thanks for your clear question, well formatted. If it's your first question, it's quite good ;) Commented Mar 16, 2018 at 0:30
0

to identify and write all xml file names which are empty to a text file for reporting purpose

find + xmlstarlet solution:

find . -type f -name "*.xml" -exec bash -c \
'v=$(xmlstarlet sel -t -i "count(//*)=1 and //*[1][not(normalize-space())]" -o 1 -b $1);
[[ -n "$v" ]] && echo "$1" >> "empty_xml.txt"' _ {} \;

empty_xml.txt file should contain a list of needed filenames/filepaths

1
  • Thanks a lot of the response bud. Unfortunately i was not able to use/test it as i don't have the xmlstartlet installed. Am gonna gonna try and get it installed and get back to you. Thanks again. Commented Mar 16, 2018 at 0:22

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.