Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Here are a few rudimentary points that summarize my personal take on the classic XML parser versus regular expressions debate.

  • Perl is a general-purpose scripting language that is especially well-suited for text processing using arbitrarily complex regular expression patterns.
  • XML is plain text. Its inventors chose this simple format intentionally. (At least one of its inventors was a Perl hacker.)
  • All the XML I've ever had to work with has been data-oriented rather than document-oriented. It has been generated by stable software in such a way that its format was uniform, constant and predictable. For the duration of time I've had to work with any particular XML data structure, the format of the XML has never changed.
  • I've mostly ever had to do just two things with XML data using Perl:  make small changes to XML files, or extract small amounts of specific data from them.
  • I know Perl regular expressions well because I use them all the time, for all kinds of applications. I don't know any of the multiple different XML parsing technologies very well (XML::Parser, XML::LibXML, XML::Twig, etc.) because I rarely have to use them.
  • If the XML changes over time, it seems to me most likely to change in ways that would require a Perl script that parses it to be updated regardless of how it's parsing the XML:  either using a proper XML parser such as XML::LibXML or using regular expression patterns.
  • If you need to parse a whole XML data structure into a whole Perl data structure, don't try to write your own XML parser in Perl, silly! That would be senseless and foolhardy.

Jim


In reply to Re: XML parsing vs regex by Jim
in thread XML parsing vs regex by derekstucki

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-26 01:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found