Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re^3: UTF-8 and XML::Parser

by remiah (Hermit)
on Oct 14, 2012 at 04:26 UTC ( #998914=note: print w/ replies, xml ) Need Help??

in reply to Re^2: UTF-8 and XML::Parser
in thread UTF-8 and XML::Parser

Maybe, you saved your script with utf-8 encoding. If you save the script as iso-8859-1, you will get iso-8859-1 result.

Below, is utf-8 saved script and 082-1 is iso-8859-1 saved script."" is "c3 bc" in utf-8. "fc" in iso-8859-1.

>cat |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd 00000000 4d c3 bc 6c 6c 65 72 |M..ller| 00000007 >cat |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd 00000000 4d fc 6c 6c 65 72 |M.ller| 00000006 >

Comment on Re^3: UTF-8 and XML::Parser
Download Code
Replies are listed 'Best First'.
Re^4: UTF-8 and XML::Parser
by Anonymous Monk on Oct 14, 2012 at 05:49 UTC
    i benchmarked the binmode variant against the utf8 open variant down here. i made an xml file with 100 lines and 32000 's (utf8) in each line ((P)CDATA). the below script did it in 0.20 seconds while the 'use utf8; / binmode' method take about 17.5 seconds.

    unfortunately perl crashes when i give a filehande to the parser while using the 'use open qw/:std :utf8/;' method when the file gets big. the 'use utf8; / binmode' method takes about 35 seconds when i pass the filehandle to the parser.

    output got redirected to /dev/null

    #!/usr/bin/perl use XML::Parser; #use utf8; use open qw/:std :utf8/; $ch = sub { my ($p, $w) = @_; # binmode STDOUT, ":encoding(UTF-8)"; print "$w\n"; }; $p = XML::Parser->new(ProtocolEncoding => 'UTF-8'); $p->setHandlers('Char' => $ch); my $xml = ""; open(F, '< x.xml'); while(<F>) { $xml .= $_; } $p->parse($xml); #$p->parse(*F); close(F);

      parsefile() has some trouble?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998914]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2015-10-09 00:19 GMT
Find Nodes?
    Voting Booth?

    Does Humor Belong in Programming?

    Results (232 votes), past polls