Revisions to How to find out the content of a XML file using Unix Sed/Awk? [duplicate]

replaced http://stackoverflow.com/ with https://stackoverflow.com/

Source Link

edited May 23, 2017 at 12:40

1

sed/awk are really about regular expressions. check this answer on stackoverflow this answer on stackoverflow why parsing HTML/XML with regular expressions is a bad idea.

for XML you really need to build a DOM of the document and then find your information. there are cmdline tools like xmlstar that allow you to get information out of XML-documents.

but do not try using sed/awk to parse XML

PS: of course, you might be able to create a simple regular expression that can extract the information needed on the files you happen to encounter in real life. e.g. the following will print the 5th line of the document, which (in your example) holds the relevant information.

# stupid and naive approach:
sed '5!d' MyXML.xml

but this makes an assumption about the layout of the file, which has nothing to do with XML. it might work for a very specific generator of the given file, but is not guaranteed to work with any XML-file following the same structure (and structured data is what XML is all about)

sed/awk are really about regular expressions. check this answer on stackoverflow why parsing HTML/XML with regular expressions is a bad idea.

for XML you really need to build a DOM of the document and then find your information. there are cmdline tools like xmlstar that allow you to get information out of XML-documents.

but do not try using sed/awk to parse XML

PS: of course, you might be able to create a simple regular expression that can extract the information needed on the files you happen to encounter in real life. e.g. the following will print the 5th line of the document, which (in your example) holds the relevant information.

# stupid and naive approach:
sed '5!d' MyXML.xml

but this makes an assumption about the layout of the file, which has nothing to do with XML. it might work for a very specific generator of the given file, but is not guaranteed to work with any XML-file following the same structure (and structured data is what XML is all about)

sed/awk are really about regular expressions. check this answer on stackoverflow why parsing HTML/XML with regular expressions is a bad idea.

for XML you really need to build a DOM of the document and then find your information. there are cmdline tools like xmlstar that allow you to get information out of XML-documents.

but do not try using sed/awk to parse XML

PS: of course, you might be able to create a simple regular expression that can extract the information needed on the files you happen to encounter in real life. e.g. the following will print the 5th line of the document, which (in your example) holds the relevant information.

# stupid and naive approach:
sed '5!d' MyXML.xml

but this makes an assumption about the layout of the file, which has nothing to do with XML. it might work for a very specific generator of the given file, but is not guaranteed to work with any XML-file following the same structure (and structured data is what XML is all about)

bold warning to not use sed/awk

Source Link

edited Sep 19, 2013 at 21:05

umläute

6.7k
2
30
54

sed/awk are really about regular expressions. check this answer on stackoverflow why parsing HTML/XML with regular expressions is a bad idea.

for XML you really need to build a DOM of the document and then find your information. there are cmdline tools like xmlstar that allow you to get information out of XML-documents.

but don't try using sed/awk.but do not try using sed/awk to parse XML

PS: of course, you might be able to create a simple regular expression that can extract the information needed on the files you happen to encounter in real life. e.g. the following will print the 5th line of the document, which (in your example) holds the relevant information.

# stupid and naive approach:
sed '5!d' MyXML.xml

but this makes an assumption about the layout of the file, which has nothing to do with XML. it might work for a very specific generator of the given file, but is not guaranteed to work with any XML-file following the same structure (and structured data is what XML is all about)

sed/awk are really about regular expressions. check this answer on stackoverflow why parsing HTML/XML with regular expressions is a bad idea.

for XML you really need to build a DOM of the document and then find your information. there are cmdline tools like xmlstar that allow you to get information out of XML-documents.

but don't try using sed/awk.

PS: of course, you might be able to create a simple regular expression that can extract the information needed on the files you happen to encounter in real life. e.g. the following will print the 5th line of the document, which (in your example) holds the relevant information.

sed '5!d' MyXML.xml

but this makes an assumption about the layout of the file, which has nothing to do with XML.

sed/awk are really about regular expressions. check this answer on stackoverflow why parsing HTML/XML with regular expressions is a bad idea.

for XML you really need to build a DOM of the document and then find your information. there are cmdline tools like xmlstar that allow you to get information out of XML-documents.

but do not try using sed/awk to parse XML

PS: of course, you might be able to create a simple regular expression that can extract the information needed on the files you happen to encounter in real life. e.g. the following will print the 5th line of the document, which (in your example) holds the relevant information.

# stupid and naive approach:
sed '5!d' MyXML.xml

but this makes an assumption about the layout of the file, which has nothing to do with XML. it might work for a very specific generator of the given file, but is not guaranteed to work with any XML-file following the same structure (and structured data is what XML is all about)

Source Link

answered Sep 19, 2013 at 12:47

umläute

6.7k
2
30
54

sed/awk are really about regular expressions. check this answer on stackoverflow why parsing HTML/XML with regular expressions is a bad idea.

for XML you really need to build a DOM of the document and then find your information. there are cmdline tools like xmlstar that allow you to get information out of XML-documents.

but don't try using sed/awk.

PS: of course, you might be able to create a simple regular expression that can extract the information needed on the files you happen to encounter in real life. e.g. the following will print the 5th line of the document, which (in your example) holds the relevant information.

sed '5!d' MyXML.xml

but this makes an assumption about the layout of the file, which has nothing to do with XML.

Stack Exchange Network

Return to Answer