http://www.perlmonks.org?node_id=576127

eXile has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a problem in reading an XML file that I want to include in a __DATA__ section. My first try was using the XML in the __DATA__ section as follows: my $cr = XMLin( *DATA ); After reading the XML::Simple docs, I saw it required an IO::Handle object, so I tried this:
my $xmldata = IO::Handle->new->fdopen(fileno(DATA),'r'); # print $xmldata->getline(); my $cr = XMLin( $xmldata );
which results in the error:

Unable to recognise encoding of this document at /usr/local/lib/perl5/site_perl/5.8.7/XML/SAX/PurePerl/EncodingDetect.pm line 96. Document requires an element Ln: 1, Col: 0

Which leads me to believe the $xmldata IO::Handle doesn't return any lines, however, If I uncomment the getline() line from the code above it prints a line, starting at character 12199 of my XML file.

I got 2 questions about this:

How can I read an XML file from a __DATA__ section (preferably with XML::Simple) ?

What magic happens with the DATA filehandle? Is some predefined number of characters read in from it at compilation time?

UPDATE: never mind question number 1, this solves my problem:

my $cr = XMLin( do { local $/ ; <DATA> } );
I'm still interested in what happend though.

Replies are listed 'Best First'.
Re: __DATA__ in XML::Simple and/or IO::Handle
by grantm (Parson) on Oct 03, 2006 at 20:37 UTC

    As runrig said, XMLin(\*DATA) is probably what you want (although this node describes why you really need a couple of extra options).

    The "Unable to recognise encoding..." message is actually an informational warning from XML::SAX::PurePerl. Due to the fact that your XML started with some whitespace (perhaps a blank line), the parser was unable to autodetect the encoding by examining the byte(s) used to represent the initial '<' character. In the absence of this information, the parser assumes utf-8 encoding and continues.

    It's not a particularly helpful message and none of the other XML parser modules emit warnings in the same situation.

    You should be aware that the PurePerl parser (version 0.14 of XML::SAX) has a bug in its handling of character entities (eg: '&amp;') in attributes. This bug caused the XML::Simple test suite to fail so I'm guessing you must have forced the install. I've supplied patches to fix the bug and remove the encoding warning so I'm hopeful both issues will be fixed in the next release.

    FYI: I released a new version of XML::Simple yesterday which includes improved documentation around passing a filehandle to XMLin.

Re: __DATA__ in XML::Simple and/or IO::Handle
by runrig (Abbot) on Oct 03, 2006 at 17:20 UTC
    This works fine for me:
    my $xml = XMLin(\*DATA);
    update: My guess at what you were experiencing is that you are opening the file that the DATA handle is pointing to, which is the program file itself, which probably starts with '#!/usr/bin/perl' or somesuch, which is not valid XML.

      When you do an fdopen on the existing descriptor is it starting at offset 0 or wherever perl left the pointer (which is usually the start of the data right after __DATA__)? That'd be the key. If there's something special about this fdopen that I don't know about you could continue to use it as long as you seek the handle ahead to the location of the original *DATA handle.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊