Re^4: Windows-1252 characters from \x{0080} thru \x{009f}

by Jim (Curate)
on Apr 26, 2012 at 02:16 UTC

in reply to Re^3: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
in thread Windows-1252 characters from \x{0080} thru \x{009f}

So, a highly limited Latin only encoding seems modern/uncrufty to you in 2012?

The Windows‑1252 character set isn't "highly limited." I'm an English-speaking monoglot—or, more to the point, an English-writing monoglot—so I can use the Windows‑1252 character set for all my writing. And I can likely continue to use it for a very long time, either until I learn another language that uses a writing system other than Latin, or until I drop dead. Saying Windows‑1252 is highly limited because it can't be used to write Chinese or Hebrew is like saying my Toyota Corolla is highly limited because it can't fly in the sky or sail the seas.

The Windows‑1252 and ISO 8859‑1 (Latin 1) character sets are still very commonly used today for digital text. For example, in my industry, e-discovery and litigation support in the United States, text and data are much more often Windows‑1252 than Unicode (UTF‑8). This is just how it is.

So, no, Windows‑1252 and Latin 1 don't seem especially unmodern or crufty to me. They're just older, single-byte encodings, not Unicode, that's all.

By the way, I'm a proponent of Unicode and I support and encourage its adoption. I'm a member of the Unicode Consortium. My name is proudly displayed on its Members page. ☺ (I confess I'm not an active member; I just pay to belong.) I've attended several Unicode Conferences and have had the good fortune to rub elbows with the Unicode cognescenti. My keen interest in Unicode dovetails nicely with my love of Perl, whose Unicode support is excellent.


Replies are listed 'Best First'.
Re^5: Windows-1252 characters from \x{0080} thru \x{009f}
by grantm (Parson) on May 24, 2012 at 01:55 UTC
    I can use the Windows‑1252 character set for all my writing ... ☺

    I find it ironic that you claim to be able to get by with only the Windows‑1252 character set and then a few paragraphs later you use a character that's not in it. Sure you can enter that character using the HTML numeric character entity form ☺ - but then the same is true of any non-ASCII character. So I don't really see why 1252 is so appealing to you. Given the more obvious choices of ASCII or Unicode why choose an encoding that is neither one thing nor the other?

