Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: XML parsing vs regex

by space_monk (Chaplain)
on May 13, 2013 at 18:21 UTC ( [id://1033325]=note: print w/replies, xml ) Need Help??


in reply to XML parsing vs regex

Umm yes, what happens if someone puts a few attributes in your parent node or the node you need and screws up your regex search as a result. Or a space? Regexs for XML will bite you on the bum when you least expect it. Parsing is slower, and can have its own issues, but is generally more predictable.
# your regex would fail if <parentNode id="1234"> # would fail because node now has attributes <parentNode > # just one space is all it takes # or this... <nodeINeed><!-- regex this comment, sucka! -->12345</nodeINeed>
If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

Replies are listed 'Best First'.
Re^2: XML parsing vs regex
by Your Mother (Archbishop) on May 13, 2013 at 22:07 UTC

    What makes you say parsing is slower? I would expect XML::LibXML to be faster than manual file handling + regular expressions. While I have no benchmarks, neither have I made any assertions. :P

      Its an assumption, I grant you, but I think I'm on safe ground when I think that building a DOM tree out of a document, followed by an XPath search is very likely to be more time consuming than a single regex pass. ;-)

      I would be curious to see how close various approaches get though, so if anyone is willing to benchmark say LibXML, XML::Twig and regex, I would like to see the results

      If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re^2: XML parsing vs regex
by Laurent_R (Canon) on May 13, 2013 at 18:59 UTC

    Well, if the format of the data input changes, then there is a chance that you have to change your program.

    Maybe the XML::LibXML program will have to be changed, or maybe the rergex program will have to be changed (or neither or both). Nobody can know for sure which one, it depends on the nature of the change.

    I know some people will probably shout at me for that, but in such a simple case, I would probably go for a regex. You don't need (and don't want) a cruise missile with an H-bomb in it to kill a mosquito on your arm.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1033325]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-19 05:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found