|We don't bite newbies here... much|
Re: Re: Case-preserving substitutionsby petral (Curate)
|on Jan 18, 2002 at 23:54 UTC||Need Help??|
Another try at explaining this:
When dealing with 7-bit ascci, the Uppercase begins at 65 and the lowercase at 97 -- 32 higher.   Since 32 is a power of two represented by bit 5 of the character, if this bit is set, the letter is lc, if unset, Uc.
The bit will be set only if the original was uppercase.   Since XORing something with itself is always 0, that is the only bit which can be set.   The lc of the replacement will have that bit set because that's what makes it lc, with other bits set to determine which letter.
So, bit 5 is set in the XORing of the original with its lc self only if the original is Uc (the opposite of the bits meaning!) and set in the lc replacement.   If they are both set XOR clears the result: hence Uc; if only the replacement is set it leaves it: lc.
I think at this point I should exclaim "QED" and run.   It seemed clear enough before I started trying to explain it in this little box!
update:   But note that jryan's answer above will work with any locale !
reupdate;   IO points out (and I should've checked) that capitalizing-by-resetting-bit-5 also works for the 8-bit characters in the standard ISO8859-1 ("latin-1") character set.