Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

encoding failure with XML::Twig

by chastevens (Initiate)
on Nov 19, 2009 at 18:01 UTC ( #808220=perlquestion: print w/replies, xml ) Need Help??
chastevens has asked for the wisdom of the Perl Monks concerning the following question:

Hey all,

I'm trying to pretty-print an XML file using XML::Twig and the code snippet below

open (TMP,">".$gPFile.".pp"); my $ppml_twig = new XML::Twig(keep_encoding => 1); $ppml_twig->parsefile($gPFile); $ppml_twig->set_pretty_print('indented'); $ppml_twig->print( \*TMP);; close(TMP);

This works fine in my development environment, but when I create an exe file using PP I keep getting an error:

Couldn't open encmap windows-1252.enc no such file or directory

I've tried manually adding the .enc files to the package, but I get the same error. I've updated XML::Twig and XML::Parser to the latest versions and I've also tried keep_encoding => 1 to avoid this encoding issue, but get the same failure. Any idea where XML::Parser::Expat is looking for this file, or how I can force it to not do this encoding? (It also seems to work on other Windows platforms, but 2008 server fails).

Replies are listed 'Best First'.
Re: encoding failure with XML::Twig
by almut (Canon) on Nov 19, 2009 at 19:29 UTC
    I've tried manually adding the .enc files to the package

    What exact pp command did you use, i.e. where did you add them to?

    XML::Parser tries to load non-built-in encodings (such as CP1252) from a list of paths @XML::Parser::Expat::Encoding_Path, so you might want to try adding (prepending) the appropriate path to that array ($ENV{PAR_TEMP} should hold the temporary/cache directory that PAR unpacks stuff to, so it should presumably be some path relative to PAR_TEMP).

    See also ENCODINGS, PAR_TEMP and the (very informative!) where_is_it.txt (from the contrib/docs directory).

    That said, I'm not really sure why XML::Parser doesn't find its encoding files by default. They should be under XML/Parser/Encodings/ (which - as it's part of the module - should've been included in the package anyway), and @XML::Parser::Expat::Encoding_Path is being initialised from @INC, which I presume PAR adds its own cache/temp directory to...

Re: encoding failure with XML::Twig
by ikegami (Pope) on Nov 19, 2009 at 19:40 UTC

    OHHHH! I completely missed the fact that you're using a packager. I thought pp was referring to "pretty print" (which is what your code does). What can I say? I use xml_pp quite often.

    @XML::Parser::Expat::Encoding_Path is built from @INC. I bet that the problem is that your packager puts a callback function in @INC instead of the path to the files.

    Assuming pp actually extracts the files to some directory, you can do:

    BEGIN { my $path_to_enc_files = ...; $path_to_enc_files =~ s{/XML/Parser/Encodings/?\z}{}; local @INC = ( @INC, $path_to_enc_files ); require XML::Parser::Expat; }

    According to almut's post, that boils down to

    BEGIN { local @INC = ( @INC, $ENV{PAR_TEMP} ); require XML::Parser::Expat; }

    Place this before the first use XML::Parser::Expat;

      Thank you both, you're saints. I'll try to explain better in future posts, that was a bit confusing.

      That solved the issue in my example, but I'm still having the problem in my main script. Should that go before all use statments, or just directly before use XML::Twig?

      Apparently Expat has 4 embedded encoding formats and if it's not one of those 4 it looks at the Encodings directory. windows-1252 isn't one of the 4 embedded. Parser doesn't include this directory by default.

        Should that go before all use statments, or just directly before use XML::Twig?

        As early as you want. It was to be before any direct or indirect call to use XML::Parser::Expat;, so the earlier the safer.

        I tried installing pp to do some debugging, but it's crashing. Sorry, I can't help you more than this.

Re: encoding failure with XML::Twig
by ikegami (Pope) on Nov 19, 2009 at 18:41 UTC

    Could you give something that reproduces the problem? I wasn't able to.

    $ cat use XML::Twig; my $gPFile = 'test.xml'; open (TMP,">".$gPFile.".pp") or die;; my $ppml_twig = new XML::Twig(keep_encoding => 1); $ppml_twig->parsefile($gPFile); $ppml_twig->set_pretty_print('indented'); $ppml_twig->print( \*TMP);; close(TMP); $ perl -pe'use open ":std", ":locale"; BEGIN { binmode STDIN, ":raw:en +coding(Windows-1252)" }' test.xml <?xml version="1.0" encoding="Windows-1252"?> <root> <element> <child>foo</child> <child>bar</child> <child></child> </element> </root> $ perl $ perl -pe'use open ":std", ":locale"; BEGIN { binmode STDIN, ":raw:en +coding(Windows-1252)" }' test.xml.pp <?xml version="1.0" encoding="Windows-1252"?> <root> <element> <child>foo</child> <child>bar</child> <child></child> </element> </root>
Re: encoding failure with XML::Twig
by ikegami (Pope) on Nov 19, 2009 at 19:03 UTC

    Looking into it some more, XML::Parser::Expat has its own decoding maps. These must be in a directory listed in @XML::Parser::Expat::Encoding_Path. (See the XML::Parser::Expat source for how it's initialised.)

    On my system, the decoding maps are located in /usr/lib/perl5/XML/Parser/Encodings/, which is relative to the directory where XML::Parser is installed (/usr/lib/perl5/XML/

    They are part of the XML-Parser distribution (along with XML::Parser and XML::Parser::Expat).

      I'm aware of the XML::Parser::Expat Encoding files, and the script works fine when run as a perl script. It's when I try to create an executable using pp that it fails. I tried to manually add the Encodings directory to my package and I still get the failures

      This script reproduces the problem

      use XML::Twig; my $gPFile = "input_test.xml"; pretty_print(); exit(0); ###################### sub pretty_print { ###################### print "Converting ".$gPFile." to pretty print file ".$gPFile.".pp\n" +; open (TMP,">".$gPFile.".pp"); my $ppml_twig = new XML::Twig(keep_encoding => 0); $ppml_twig->parsefile($gPFile); $ppml_twig->set_pretty_print('indented'); $ppml_twig->print( \*TMP);; close(TMP); }

      With this as an input file (input_test.xml)

      <?xml version="1.0" encoding="windows-1252"?> <JOB Label="input_test" > <SUBJOB> </SUBJOB> </JOB>

      I used the following command to create the executable:

      pp -l libexpat.dll -a "C:/Strawberry/perl/site/lib/XML/Parser/Encoding +s;lib/XML/Parser/Encodings" -o pp_only.exe

      I think this may be more of a pp issue than XML::Twig or Parser, but I'm not sure if XML::Twig can be configured to bypass the encoding.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://808220]
Approved by toolic
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2018-06-22 05:51 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (121 votes). Check out past polls.