XML parsing vs regex

space_monk
May 13, 2013

in reply to XML parsing vs regex

Umm yes, what happens if someone puts a few attributes in your parent node or the node you need and screws up your regex search as a result. Or a space? Regexs for XML will bite you on the bum when you least expect it. Parsing is slower, and can have its own issues, but is generally more predictable.
# your regex would fail if <parentNode id="1234"> # would fail because node now has attributes <parentNode > # just one space is all it takes # or this... <nodeINeed><!-- regex this comment, sucka! -->12345</nodeINeed>
XML parsing vs regex
Your Mother on May 13, 2013

    What makes you say parsing is slower? I would expect XML::LibXML to be faster than manual file handling + regular expressions. While I have no benchmarks, neither have I made any assertions. :P

      Its an assumption, I grant you, but I think I'm on safe ground when I think that building a DOM tree out of a document, followed by an XPath search is very likely to be more time consuming than a single regex pass. ;-)

      I would be curious to see how close various approaches get though, so if anyone is willing to benchmark say LibXML, XML::Twig and regex, I would like to see the results

XML parsing vs regex
Laurent_R on May 13, 2013

    Well, if the format of the data input changes, then there is a chance that you have to change your program.

    Maybe the XML::LibXML program will have to be changed, or maybe the rergex program will have to be changed (or neither or both). Nobody can know for sure which one, it depends on the nature of the change.

    I know some people will probably shout at me for that, but in such a simple case, I would probably go for a regex. You don't need (and don't want) a cruise missile with an H-bomb in it to kill a mosquito on your arm.

