Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: XML::Simple cannot parse Simple XML file

by wolv (Pilgrim)
on Jun 13, 2006 at 23:55 UTC ( [id://555140]=note: print w/replies, xml ) Need Help??


in reply to Re^2: XML::Simple cannot parse Simple XML file
in thread XML::Simple cannot parse Simple XML file

And just in case you have to process XML that you don't produce and is invalid, I recommend XML::Liberal. For an example, Plagger uses it to parse broken feeds.
  • Comment on Re^3: XML::Simple cannot parse Simple XML file

Replies are listed 'Best First'.
Re^4: XML::Simple cannot parse Simple XML file
by davorg (Chancellor) on Jun 14, 2006 at 05:37 UTC
    And just in case you have to process XML that you don't produce and is invalid

    Terminology is important here. There is no such thing as "invalid XML". If your data doesn't follow the XML specs then it is not XML.

    When presented with data that is supposed to be XML but isn't, the best action is to insist that it be replaced by something that is XML. Remember how the web was taken over by invalid HTML because browsers were too lenient? Let's not allow XML to go the same way.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      There is no such thing as "invalid XML".

      So if you make a typo that's a syntax error in Perl, it's no longer Perl? Although I understand what you're saying, I don't agree with how you said it.

      "He's not a patient person." = "He's an impatient person." "This is not valid XML." = "This is invalid XML."

      It's like, say you define a "table" as "a thing for sitting stuff on that has four legs". Then you find another thing that you could sit stuff on, except one of the legs is missing. Wouldn't you say that it's a broken table rather than say that broken tables don't exist?

        Of course you go through typos when you're developing stuff - whether it's Perl, XML or anything else. That's not the issue. The issue is people who publish documents that claim to be XML but which don't conform to the specification. And here, we're not talking about a typo. We're talking about a tag name that starts with a disallowed character - that's a pretty major problem in my opinion and people shouldn't work around it.

        --
        <http://dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re^4: XML::Simple cannot parse Simple XML file
by gellyfish (Monsignor) on Jun 14, 2006 at 07:49 UTC

    I'm not convinced that the name of the module is very sensible. If it can parse 'invalid XML' without an error then it isn't a compliant XML Processor, the specification is quite clear about this:

    Validating and non-validating processors alike MUST report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.
    The emphasis is from the specification. Obviously it's a bit late now but the module would be better to be called something like 'Text::XMLish' or 'Text::Tagged::AngleBrackets' to make it clear. There is no such thing as a 'Liberal' XML parser in terms of the well-formedness constraints.

    /J\

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://555140]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-05-26 21:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found