http://www.perlmonks.org?node_id=1006825

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am using Win32::IE::Mechanize to access a web page that is encoded in UTF-8.


However, when I try to access data in the DOM model that includes unicode characters these are returned as question mark characters (HEX 3F).


Any help would be very much appreciated. Sample code is below:


use strict; use warnings; use File::BOM; use Win32::IE::Mechanize; use Time::HiRes qw( usleep gettimeofday tv_interval stat ); use utf8; # create Win32::IE::Mechanize object my $mech = Win32::IE::Mechanize->new(visible => 1); # open the URL $mech->get('http://kr.yahoo.com/'); sleep (10); # get the DOM document my $doc = $mech->{agent}->Document; # get the webpage title my $title = $doc->title; # create a utf-8 text file open DEBUGFILE, ">:via(File::BOM):encoding(UTF-8)", "debug.txt" or die + $!; # write the title to file print DEBUGFILE "Title:" . $title . "\n"; # write the title length to the file print DEBUGFILE "Title Length:" . length ($title) . "\n"; # write the hex byte string of the title to the file print DEBUGFILE "Title Hex Byte String:" . unpack("H48", $title) . "\n +";

Code output is:


Title:??! ??? Title Length:7 Title Hex Byte String:3f3f21203f3f3f