|Syntactic Confectionery Delight|
UTF8 hash key downgraded when assignedby gibus (Acolyte)
|on Nov 30, 2018 at 23:51 UTC||Need Help??|
gibus has asked for the wisdom of the Perl Monks concerning the following question:
I've stumbled on a strange behaviour with hash keys, happening on every Perl version I could test from 5.16 to 5.26
It has been asked some years ago on stack overflow but without any answer on whether it is an optimization bug or an expected behaviour.
The issue is that if you initialize a hash with a key having non-ascii (for eg. iso-8859-1) characters, the key is properly encoded in UTF8 (with UTF8 flag on). But then if you assign a value to the hash element corresponding to this key, the key is downgraded (probably encoded in iso-8859-1). You can imagine the consequences if you have to do some processing on this key, expecting it to be UTF8 encoded…
Here's a script showing the issue:
with the following output:
As shown with this code, the issue can be solved by upgrading the key to UTF8. But I would never have thought I should have done it before stumbling to this issue. I've never read anything in perldoc explaining this behaviour. Do you think it's expected for some reason ? Thanks!