Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: Connect SQLite with unicode directory

by Corion (Pope)
on Nov 08, 2017 at 10:15 UTC ( #1202952=note: print w/replies, xml ) Need Help??

in reply to Connect SQLite with unicode directory

Filesystems in general don't know about encoding.

Unixish filesystems (and the APIs) usually expose the filename as a binary blob, which matches well with using UTF-8 encoded filenames.

Windows filesystems (and the APIs) usually expose the filename as Wide Characters, so if you get the filename as UTF-8, you need to translate it to Wide Characters and you also need to use the Wide APIs (CreateFileW etc) to access such files.

As a workaround to these issues, I am a fan of Text::Unidecode (and Text::CleanFragment) to downcase characters to ASCII.

Personally, I try to avoid non-ASCII characters in the functional parts of programs and instead use the named entities:

my $PathCorpusDB2 = "\N{LOWER CASE LATIN LETTER U WITH DIAERESIS}/data +baseTest2.db";

This still won't solve your problem with Umlauts in the charset though. I think that using CreateFileW() with UTF-8 encoded filenames should work, but I don't know how to tell SQLite that.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1202952]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2018-03-25 02:01 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (299 votes). Check out past polls.