in reply to regex help with unicode and [Path::Tiny]

tl;dr

File names are encoded (nowadays, usually in UTF-8).

#!/usr/bin/perl use warnings; use strict; use utf8; use open OUT => ':encoding(UTF-8)', ':std'; use Encode; use Path::Tiny; my $cyrillic_utf8 = shift; my $cyrillic = decode('UTF-8', $cyrillic_utf8); my $out = path("$cyrillic.txt"); $out->spew($cyrillic); my ($in) = path('.')->children(qr/$cyrillic_utf8\.txt/); my $string = $in->slurp_utf8; print $string, "\n";

Running with дом as a parameter outputs дом.

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: regex help with unicode and [Path::Tiny]
by Aldebaran (Chaplain) on Aug 31, 2018 at 23:39 UTC

    It was a super long post. I wasn't trying to break anyone's scroll finger, but you seemed to address exactly what I was reaching for. I have simplified and shorter versions that I posted on github. The clone script and the main script now use only Encode. What I had to do was decode all the paths before they went into new paths.

    Many of the changes were similar to this:

    foreach (@ARGV) { say "before decode is $_"; $_ = decode( 'UTF-8', $_ ); say "after decode is $_"; } my ( $from, $to, $pop ) = @ARGV;

    I also have now chopped out all instances of File::Slurp, File::Basename, Path::Class, and 2 others that were not to have needed or encoding/decoding. What results seems to look right: created html page

    It seems to render correctly, but when I hit the css checker, it says that the file does not exist. I wonder if this is when I was supposed to use the URI escaping module.

    Anyways, many thanks for your response and code post that modeled the way forward for unicode use.