http://www.perlmonks.org?node_id=1202952


in reply to Connect SQLite with unicode directory

Filesystems in general don't know about encoding.

Unixish filesystems (and the APIs) usually expose the filename as a binary blob, which matches well with using UTF-8 encoded filenames.

Windows filesystems (and the APIs) usually expose the filename as Wide Characters, so if you get the filename as UTF-8, you need to translate it to Wide Characters and you also need to use the Wide APIs (CreateFileW etc) to access such files.

As a workaround to these issues, I am a fan of Text::Unidecode (and Text::CleanFragment) to downcase characters to ASCII.

Personally, I try to avoid non-ASCII characters in the functional parts of programs and instead use the named entities:

my $PathCorpusDB2 = "\N{LOWER CASE LATIN LETTER U WITH DIAERESIS}/data +baseTest2.db";

This still won't solve your problem with Umlauts in the charset though. I think that using CreateFileW() with UTF-8 encoded filenames should work, but I don't know how to tell SQLite that.