in reply to
XML parsing vs regex
Here are a few rudimentary points that summarize my personal take on the classic XML parser versus regular expressions debate.
- Perl is a general-purpose scripting language that is especially well-suited for text processing using arbitrarily complex regular expression patterns.
- XML is plain text. Its inventors chose this simple format intentionally. (At least one of its inventors was a Perl hacker.)
- All the XML I've ever had to work with has been data-oriented rather than document-oriented. It has been generated by stable software in such a way that its format was uniform, constant and predictable. For the duration of time I've had to work with any particular XML data structure, the format of the XML has never changed.
- I've mostly ever had to do just two things with XML data using Perl: make small changes to XML files, or extract small amounts of specific data from them.
- I know Perl regular expressions well because I use them all the time, for all kinds of applications. I don't know any of the multiple different XML parsing technologies very well (XML::Parser, XML::LibXML, XML::Twig, etc.) because I rarely have to use them.
- If the XML changes over time, it seems to me most likely to change in ways that would require a Perl script that parses it to be updated regardless of how it's parsing the XML: either using a proper XML parser such as XML::LibXML or using regular expression patterns.
- If you need to parse a whole XML data structure into a whole Perl data structure, don't try to write your own XML parser in Perl, silly! That would be senseless and foolhardy.