Re: XML parsing vs regex

Umm yes, what happens if someone puts a few attributes in your parent node or the node you need and screws up your regex search as a result. Or a space? Regexs for XML will bite you on the bum when you least expect it. Parsing is slower, and can have its own issues, but is generally more predictable.

# your regex would fail if
<parentNode id="1234"> # would fail because node now has attributes

<parentNode > # just one space is all it takes

# or this...
<nodeINeed><!-- regex this comment, sucka! -->12345</nodeINeed>
[download]

If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

Comment on Re: XML parsing vs regex Download Code

Replies are listed 'Best First'.
Re^2: XML parsing vs regex by Your Mother (Archbishop) on May 13, 2013 at 22:07 UTC
What makes you say parsing is slower? I would expect XML::LibXML to be faster than manual file handling + regular expressions. While I have no benchmarks, neither have I made any assertions. :P	[reply]
Re^3: XML parsing vs regex by space_monk (Chaplain) on May 14, 2013 at 05:35 UTC
Its an assumption, I grant you, but I think I'm on safe ground when I think that building a DOM tree out of a document, followed by an XPath search is very likely to be more time consuming than a single regex pass. ;-) I would be curious to see how close various approaches get though, so if anyone is willing to benchmark say LibXML, XML::Twig and regex, I would like to see the results If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)	[reply]
Re^2: XML parsing vs regex by Laurent_R (Canon) on May 13, 2013 at 18:59 UTC
Well, if the format of the data input changes, then there is a chance that you have to change your program. Maybe the XML::LibXML program will have to be changed, or maybe the rergex program will have to be changed (or neither or both). Nobody can know for sure which one, it depends on the nature of the change. I know some people will probably shout at me for that, but in such a simple case, I would probably go for a regex. You don't need (and don't want) a cruise missile with an H-bomb in it to kill a mosquito on your arm.	[reply]


Don't ask to ask, just ask
	PerlMonks