Re^2: dealing with cyrillic characters

Wow, thanks, it was that simple a fix. I got everything I wanted by setting the binmode to utf8 on File::Slurp. I looked on gedit to see what encoding the underlying text files might have and was unable to ascertain it. That I can read the cyrillic makes me think it is indeed utf8. Relevant code:

sub get_rus_text {
use 5.010;
use File::Basename;
use Cwd;
use HTML::FromText;
use File::Slurp;
use Path::Class;

my $rvars = shift;
my %vars = %$rvars;
my %content;
my $refc = \%content;
opendir my $eh, $vars{"rus_captions"} or die "dead  $!\n";
while (defined ($_ = readdir($eh))){
next if m/~$/;
next if -d;
### revision for better russian use 7/18
# set binmode for File::Slurp
# run cyrillic through HTML::FromText
if (m/txt$/){
   my $file = file($vars{"rus_captions"},$_);
   my $string = read_file($file, binmode => ':utf8' );
   #say "string is $string";
   my $temp = text2html(
      $string,
      urls  => 1,
      email => 1,
      paras => 1,
     
   );
   # surround by divs
   my $oitop = read_file($vars{"oitop"});
   my $oibottom = read_file($vars{"oibottom"});
   my $text = $oitop.$temp.$oibottom;
   #say "text is $text";
   $content{$_} = $text;
   }
}
closedir $eh;
#important to sort
my @return;
foreach my $key (sort keys %content) {
    print $content{$key} . "\n";
    push @return, $content{$key};
}
return \@return;
}
[download]

improved page I budgeted all day to figure this out, so I'm gonna go form some concrete. большое спасибо снова.

Comment on Re^2: dealing with cyrillic characters Download Code

Replies are listed 'Best First'.
Re^3: dealing with cyrillic characters by haukex (Archbishop) on Jun 23, 2018 at 08:50 UTC
The AM already provided a link to File::Slurp is broken and wrong. I suggest you use this instead (as just discussed here): `my $string = do { open my $fh, '<:raw:encoding(UTF-8)', $file or die "$file: $!"; local $/; <$fh> };` [download]	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks