http://www.perlmonks.org?node_id=243908


in reply to Re: Are you looking at XML processing the right way? (merge)
in thread is XML too hard?

For example, "ways to rome" just does it wrong. Of course you shouldn't do it that way!

By any means, send an alternate way. No kidding. If the article can show a safer way to do it with regexps, then it should.

As a matter of fact I use regexps to process XML in very specific cases. First, when I am processing "not-quite-XML", which would make a parser choke, I use regexps to turn it into real XML. Then when I need to do things that XML modules (even XML::Twig!) can't do AND when I have created the XML myself. That is, it uses no entities/comment/weird stuff at all, it has no recursive tags and I know the order of attributes in tags. Then I use regexps (or I upgrade XML::Twig, see the new wrap_children method ;--)

The problem is people who don't know XML besides "it's HTML with my own tags" (and very few people actually know HTML that well), who use regexps for recuring processes, where the likelyhood of the XML changing in the future in ways that will break their code is quite high (<peet_peeve>mind you, the DOM has very similar issues</peet_peeve>). It is not a problem of regexps=bad, it's just a problem of knowing your tools, knowing the problem space (and there are indications that Tim Bray knows the problem space ;--) and knowing the limitations of the tools in the context in which you are using them. When you don't quite know the environment, better to play it safe and to use a parser than regexp.