Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: UTF-8 and XML::Parser

by remiah (Hermit)
on Oct 14, 2012 at 04:26 UTC ( #998914=note: print w/ replies, xml ) Need Help??


in reply to Re^2: UTF-8 and XML::Parser
in thread UTF-8 and XML::Parser

Maybe, you saved your script with utf-8 encoding. If you save the script as iso-8859-1, you will get iso-8859-1 result.

Below, 082.pl is utf-8 saved script and 082-1 is iso-8859-1 saved script."" is "c3 bc" in utf-8. "fc" in iso-8859-1.

>cat 082.pl |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd 00000000 4d c3 bc 6c 6c 65 72 |M..ller| 00000007 >cat 082-1.pl |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd 00000000 4d fc 6c 6c 65 72 |M.ller| 00000006 >


Comment on Re^3: UTF-8 and XML::Parser
Download Code
Re^4: UTF-8 and XML::Parser
by Anonymous Monk on Oct 14, 2012 at 05:49 UTC
    i benchmarked the binmode variant against the utf8 open variant down here. i made an xml file with 100 lines and 32000 's (utf8) in each line ((P)CDATA). the below script did it in 0.20 seconds while the 'use utf8; / binmode' method take about 17.5 seconds.

    unfortunately perl crashes when i give a filehande to the parser while using the 'use open qw/:std :utf8/;' method when the file gets big. the 'use utf8; / binmode' method takes about 35 seconds when i pass the filehandle to the parser.

    output got redirected to /dev/null

    #!/usr/bin/perl use XML::Parser; #use utf8; use open qw/:std :utf8/; $ch = sub { my ($p, $w) = @_; # binmode STDOUT, ":encoding(UTF-8)"; print "$w\n"; }; $p = XML::Parser->new(ProtocolEncoding => 'UTF-8'); $p->setHandlers('Char' => $ch); my $xml = ""; open(F, '< x.xml'); while(<F>) { $xml .= $_; } $p->parse($xml); #$p->parse(*F); close(F);

      parsefile() has some trouble?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998914]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2015-07-04 10:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (59 votes), past polls