Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: dealing with cyrillic characters

by Aldebaran (Curate)
on Jun 22, 2018 at 17:43 UTC ( [id://1217236]=note: print w/replies, xml ) Need Help??


in reply to Re: dealing with cyrillic characters
in thread dealing with cyrillic characters

Wow, thanks, it was that simple a fix. I got everything I wanted by setting the binmode to utf8 on File::Slurp. I looked on gedit to see what encoding the underlying text files might have and was unable to ascertain it. That I can read the cyrillic makes me think it is indeed utf8. Relevant code:

sub get_rus_text { use 5.010; use File::Basename; use Cwd; use HTML::FromText; use File::Slurp; use Path::Class; my $rvars = shift; my %vars = %$rvars; my %content; my $refc = \%content; opendir my $eh, $vars{"rus_captions"} or die "dead $!\n"; while (defined ($_ = readdir($eh))){ next if m/~$/; next if -d; ### revision for better russian use 7/18 # set binmode for File::Slurp # run cyrillic through HTML::FromText if (m/txt$/){ my $file = file($vars{"rus_captions"},$_); my $string = read_file($file, binmode => ':utf8' ); #say "string is $string"; my $temp = text2html( $string, urls => 1, email => 1, paras => 1, ); # surround by divs my $oitop = read_file($vars{"oitop"}); my $oibottom = read_file($vars{"oibottom"}); my $text = $oitop.$temp.$oibottom; #say "text is $text"; $content{$_} = $text; } } closedir $eh; #important to sort my @return; foreach my $key (sort keys %content) { print $content{$key} . "\n"; push @return, $content{$key}; } return \@return; }

improved page I budgeted all day to figure this out, so I'm gonna go form some concrete. большое спасибо снова.

Replies are listed 'Best First'.
Re^3: dealing with cyrillic characters
by haukex (Archbishop) on Jun 23, 2018 at 08:50 UTC

    The AM already provided a link to File::Slurp is broken and wrong. I suggest you use this instead (as just discussed here):

    my $string = do { open my $fh, '<:raw:encoding(UTF-8)', $file or die "$file: $!"; local $/; <$fh> };

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1217236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-19 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found