Re: storable and utf8

by graff (Chancellor)
on Mar 06, 2014 at 03:28 UTC

in reply to storable and utf8

Just for grins, you might want to try out the script I posted here: unichist -- count/summarize characters in data. Run it on each file, to see whether you have any characters outside the (7-bit) ASCII range, and if so, whether they are properly encoded as utf8 or not, and if they are, what range of code points you have.

Given that you have an error message that explicitly mentions "\xDD", it seems that the file in question clearly has non-ASCII characters content (i.e. bytes with the 8th bit set, whether or not they also happen to be utf8).

Also, it seems odd that when your code says use Storable, you get v2.20, but one of your error messages is saying something about v2.7 - what's up with that? How much do you know about the origins or provenance of these files?

