Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: xml parsing without using cpan modules

by tomhukins (Curate)
on Aug 10, 2004 at 14:14 UTC ( #381592=note: print w/replies, xml ) Need Help??


in reply to xml parsing without using cpan modules

Tim Bray parses XML with Perl and regular expressions, but he knows XML much better than most of us.
  • Comment on Re: xml parsing without using cpan modules

Replies are listed 'Best First'.
Re^2: xml parsing without using cpan modules
by Aristotle (Chancellor) on Aug 10, 2004 at 15:30 UTC

    And more importantly, he knows the XML generators whose output he is dealing with, which means he doesn't have to account for all plausible cases — only those the generators he is dealing with take advantage of.

    If you want to deal with XML in the general case, then you do have to parse, no way around it.

    Makeshifts last the longest.

      It is possible to write an XML parser using regular expressions. Check out "REX: XML Shallow Parsing with Regular Expressions", http://www.cs.sfu.ca/~cameron/REX.html. It even has Perl code for doing.

      It effectively splits the XML into a list of strings on logical boundaries by repeating a regular expression that matches XML markup. It is fairly easy to find the type of each chunk by looking at the first couple of characters.

        Of course you can parse using regular expressions. You just shouldn't grope around in a string representing an XML document using regular expressions, because you have to be certain about the context in which any match occured. That means you have to scan the string strictly front-to-back, probably using the /gc options and the \G anchor to make sure you don't miss anything. Simply picking matches out of the middle of the string is very likely to be a broken approach unless you are dealing with a known subset of XML syntax.

        Makeshifts last the longest.

Re^2: xml parsing without using cpan modules
by tbone1 (Monsignor) on Aug 10, 2004 at 15:46 UTC
    I've done it as well, but I've been using Perl to parse HTML since 1994 and XML for four years. In those cases, I also have the file creator within spitting distance, "and he was a poor spitter, lacking both distance and control"(*), so I could literally beat any them over the head if I wanted to. If I didn't have a lot of experience I'd never do it, and if the file provider isn't within strangling distance, I go with CPAN modules.

    In short, you can do it, but you probably shouldn't do it.

    (*) - P.G. Wodehouse, Money in the Bank

    --
    tbone1, YAPS (Yet Another Perl Schlub)
    And remember, if he succeeds, so what.
    - Chick McGee

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://381592]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2021-01-18 00:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?