Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Is a file XML?

by ajt (Prior)
on Sep 25, 2001 at 15:10 UTC ( #114505=perlquestion: print w/replies, xml ) Need Help??
ajt has asked for the wisdom of the Perl Monks concerning the following question:

I have a requirment to quickly check to see if a document uploaded to the server is an XML file. It should be, but you never know....

I would like my script (built round CGI) to give the user an error if the file isn't a well formed XML one. I've checked that the file is more than zero bytes already.

I don't want to check for validity (I can't anyway), I just want to quickly be able to emit an error message to the user's browser, and die.

I assume that if I simply open up the file in an XML parser, and it doesn't die when I parse it, then it's XML and well-formed, which should be enough to pass the file on to another process.

Q1 Is this a sensible approach?

Q2 If it is, which module is fastest and simplest? I don't plan to actually do anything with the file. Just do a HTTP POST to another server.

As ever, many thanks in advance.

Replies are listed 'Best First'.
Re: Is a file XML?
by davorg (Chancellor) on Sep 25, 2001 at 15:23 UTC

    That's pretty much how I'd do it. Using code like this:

    use XML::Parser; my $p = XML::Parser->new; my $file = 'whatever.xml'; eval { $p->parsefile($file) }; if ($@) { # XML is not well formed } else { # XML is well formed }

    "The first rule of Perl club is you don't talk about Perl club."

Re: Is a file XML?
by mirod (Canon) on Sep 25, 2001 at 15:40 UTC

    Just to add to davorg's answer:

    • if you want an error message that's a little more dscriptive you can use the ErrorContext => 2 when creating the XML::Parser object, to get lines around the error,
    • one of the most common problem I found with XML-files-that-are-not-really-XML is that they don't include an encoding declaration, even though they include latin1 (accented) characters, if this happens often you can at least diagnose it better by doing a second run, using ProtocolEncoding => 'ISO-8859-1' when you create the parser object. Otherwise the error message is quite random, depending on what the parser makes of the character.
Re: Is a file XML?
by Caillte (Friar) on Sep 25, 2001 at 15:26 UTC

    Open up the file and look for the declaration line.

    $isxml = 0; while(<HANDLE>) { if($_ =~ /.*xml\s+version/i) { $isxml = 1; last; } } if($isxml) { # xml file stuff here } else { # not xml file stuff here }

    This was written in a hurry and not tested (I need to run to a meeting ;)) but it should be enough to start with

    Update: Rereading this post-meeting I see I missed the question altogether.... davorg gave the right answer ;)

    $japh->{'Caillte'} = $me;

      Just because a file has a XML declaration on the first line doesn't mean that it's a well-formed XML file.


      "The first rule of Perl club is you don't talk about Perl club."

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://114505]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2018-06-24 22:41 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.