Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Is a file XML?

by ajt (Prior)
on Sep 25, 2001 at 15:10 UTC ( #114505=perlquestion: print w/ replies, xml ) Need Help??
ajt has asked for the wisdom of the Perl Monks concerning the following question:

I have a requirment to quickly check to see if a document uploaded to the server is an XML file. It should be, but you never know....

I would like my script (built round CGI) to give the user an error if the file isn't a well formed XML one. I've checked that the file is more than zero bytes already.

I don't want to check for validity (I can't anyway), I just want to quickly be able to emit an error message to the user's browser, and die.

I assume that if I simply open up the file in an XML parser, and it doesn't die when I parse it, then it's XML and well-formed, which should be enough to pass the file on to another process.

Q1 Is this a sensible approach?

Q2 If it is, which module is fastest and simplest? I don't plan to actually do anything with the file. Just do a HTTP POST to another server.

As ever, many thanks in advance.

Comment on Is a file XML?
Re: Is a file XML?
by davorg (Chancellor) on Sep 25, 2001 at 15:23 UTC

    That's pretty much how I'd do it. Using code like this:

    use XML::Parser; my $p = XML::Parser->new; my $file = 'whatever.xml'; eval { $p->parsefile($file) }; if ($@) { # XML is not well formed } else { # XML is well formed }
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you don't talk about Perl club."

Re: Is a file XML?
by Caillte (Friar) on Sep 25, 2001 at 15:26 UTC

    Open up the file and look for the declaration line.

    $isxml = 0; while(<HANDLE>) { if($_ =~ /.*xml\s+version/i) { $isxml = 1; last; } } if($isxml) { # xml file stuff here } else { # not xml file stuff here }

    This was written in a hurry and not tested (I need to run to a meeting ;)) but it should be enough to start with

    Update: Rereading this post-meeting I see I missed the question altogether.... davorg gave the right answer ;)

    $japh->{'Caillte'} = $me;

      Just because a file has a XML declaration on the first line doesn't mean that it's a well-formed XML file.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you don't talk about Perl club."

Re: Is a file XML?
by mirod (Canon) on Sep 25, 2001 at 15:40 UTC

    Just to add to davorg's answer:

    • if you want an error message that's a little more dscriptive you can use the ErrorContext => 2 when creating the XML::Parser object, to get lines around the error,
    • one of the most common problem I found with XML-files-that-are-not-really-XML is that they don't include an encoding declaration, even though they include latin1 (accented) characters, if this happens often you can at least diagnose it better by doing a second run, using ProtocolEncoding => 'ISO-8859-1' when you create the parser object. Otherwise the error message is quite random, depending on what the parser makes of the character.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://114505]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-07-22 12:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (113 votes), past polls