http://www.perlmonks.org?node_id=998922


in reply to Re^3: UTF-8 and XML::Parser
in thread UTF-8 and XML::Parser

i benchmarked the binmode variant against the utf8 open variant down here. i made an xml file with 100 lines and 32000 ü's (utf8) in each line ((P)CDATA). the below script did it in 0.20 seconds while the 'use utf8; / binmode' method take about 17.5 seconds.

unfortunately perl crashes when i give a filehande to the parser while using the 'use open qw/:std :utf8/;' method when the file gets big. the 'use utf8; / binmode' method takes about 35 seconds when i pass the filehandle to the parser.

output got redirected to /dev/null

#!/usr/bin/perl use XML::Parser; #use utf8; use open qw/:std :utf8/; $ch = sub { my ($p, $w) = @_; # binmode STDOUT, ":encoding(UTF-8)"; print "$w\n"; }; $p = XML::Parser->new(ProtocolEncoding => 'UTF-8'); $p->setHandlers('Char' => $ch); my $xml = ""; open(F, '< x.xml'); while(<F>) { $xml .= $_; } $p->parse($xml); #$p->parse(*F); close(F);

Replies are listed 'Best First'.
Re^5: UTF-8 and XML::Parser
by remiah (Hermit) on Oct 14, 2012 at 06:33 UTC

    parsefile() has some trouble?