Printing Unicode on the Windows Console and the importance of of i/o layersby nikosv (Chaplain)
|on Nov 17, 2010 at 07:23 UTC||Need Help??|
I wanted to take a look on printing Unicode on the windows console by using the Win32 api and also check how is done in other languages, rather than directly from Perl which hides a lot of details
Problems when wanting to print to the console :
The need arose when I needed to print an old style dos box using the cp437 box drawing characters on the console using their Unicode code points rather than their ASCII representation. The output was mangled/overlapped
Take a look at this pictorial output to get a clear view of the problem
The code that generated the incorrect result is :
Since C++ has a better relationship with Windows than Perl, I did some research on how you can manipulate the console in C++ and used the underlying concepts in Perl.
Fortunately I bumped into illegalargumentexception who has a fantastic tutorial on the subject using multi-language examples. Also the blog explains various issues on Unicode. great stuff, totally recommended
So the equivalent Perl code would be:
The trick is to use high-level console I/O (WriteFile) rather than low-level console I/O (WriteConsoleOutput) and there is no need to use the WideCharToMultiByte function since Perl uses UTF8 natively while C++ uses 16bit wide chars which need to be converted into multibytes. Note here that Windows treats the wchar as 'real' Unicode while it treats Utf8 as a multibyte encoding, the same as treating ASCII code pages.
Also note that for the example to work, the actual code page of the console does not play a role but the font must be set to Lucida console. However the Lucida Console font does not support the whole Unicode range, so it does not include all Unicode glyphs. There is only one issue, how to programmatically set the font on the users' console. This, unfortunately, can only be done on Windows vista and upwards with the SetCurrentConsoleFontEx api function
Ultimately, in pure Perl code without using any Win32 API's (although we still need it for the SetConsoleOutputCP), we turn perio buffering off by using the :unix layer, so it doesn't mess with the console buffer :
Compare this little Perl example with the complexity the other languages have to go through to get to the same result and appreciate Perl's power. magic.