Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Is a file XML?

by ajt (Prior)
on Sep 25, 2001 at 15:10 UTC ( #114505=perlquestion: print w/ replies, xml ) Need Help??
ajt has asked for the wisdom of the Perl Monks concerning the following question:

I have a requirment to quickly check to see if a document uploaded to the server is an XML file. It should be, but you never know....

I would like my script (built round CGI) to give the user an error if the file isn't a well formed XML one. I've checked that the file is more than zero bytes already.

I don't want to check for validity (I can't anyway), I just want to quickly be able to emit an error message to the user's browser, and die.

I assume that if I simply open up the file in an XML parser, and it doesn't die when I parse it, then it's XML and well-formed, which should be enough to pass the file on to another process.

Q1 Is this a sensible approach?

Q2 If it is, which module is fastest and simplest? I don't plan to actually do anything with the file. Just do a HTTP POST to another server.

As ever, many thanks in advance.

Comment on Is a file XML?
Replies are listed 'Best First'.
Re: Is a file XML?
by davorg (Chancellor) on Sep 25, 2001 at 15:23 UTC

    That's pretty much how I'd do it. Using code like this:

    use XML::Parser; my $p = XML::Parser->new; my $file = 'whatever.xml'; eval { $p->parsefile($file) }; if ($@) { # XML is not well formed } else { # XML is well formed }
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you don't talk about Perl club."

Re: Is a file XML?
by mirod (Canon) on Sep 25, 2001 at 15:40 UTC

    Just to add to davorg's answer:

    • if you want an error message that's a little more dscriptive you can use the ErrorContext => 2 when creating the XML::Parser object, to get lines around the error,
    • one of the most common problem I found with XML-files-that-are-not-really-XML is that they don't include an encoding declaration, even though they include latin1 (accented) characters, if this happens often you can at least diagnose it better by doing a second run, using ProtocolEncoding => 'ISO-8859-1' when you create the parser object. Otherwise the error message is quite random, depending on what the parser makes of the character.
Re: Is a file XML?
by Caillte (Friar) on Sep 25, 2001 at 15:26 UTC

    Open up the file and look for the declaration line.

    $isxml = 0; while(<HANDLE>) { if($_ =~ /.*xml\s+version/i) { $isxml = 1; last; } } if($isxml) { # xml file stuff here } else { # not xml file stuff here }

    This was written in a hurry and not tested (I need to run to a meeting ;)) but it should be enough to start with

    Update: Rereading this post-meeting I see I missed the question altogether.... davorg gave the right answer ;)

    $japh->{'Caillte'} = $me;

      Just because a file has a XML declaration on the first line doesn't mean that it's a well-formed XML file.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you don't talk about Perl club."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://114505]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2015-07-29 11:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (263 votes), past polls