Wow, thanks, it was that simple a fix. I got everything I wanted by setting the binmode to utf8 on File::Slurp. I looked on gedit to see what encoding the underlying text files might have and was unable to ascertain it. That I can read the cyrillic makes me think it is indeed utf8. Relevant code:
sub get_rus_text {
use 5.010;
use File::Basename;
use Cwd;
use HTML::FromText;
use File::Slurp;
use Path::Class;
my $rvars = shift;
my %vars = %$rvars;
my %content;
my $refc = \%content;
opendir my $eh, $vars{"rus_captions"} or die "dead $!\n";
while (defined ($_ = readdir($eh))){
next if m/~$/;
next if -d;
### revision for better russian use 7/18
# set binmode for File::Slurp
# run cyrillic through HTML::FromText
if (m/txt$/){
my $file = file($vars{"rus_captions"},$_);
my $string = read_file($file, binmode => ':utf8' );
#say "string is $string";
my $temp = text2html(
$string,
urls => 1,
email => 1,
paras => 1,
);
# surround by divs
my $oitop = read_file($vars{"oitop"});
my $oibottom = read_file($vars{"oibottom"});
my $text = $oitop.$temp.$oibottom;
#say "text is $text";
$content{$_} = $text;
}
}
closedir $eh;
#important to sort
my @return;
foreach my $key (sort keys %content) {
print $content{$key} . "\n";
push @return, $content{$key};
}
return \@return;
}
improved page I budgeted all day to figure this out, so I'm gonna go form some concrete. большое спасибо снова.