Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Printing the first letter of the Hebrew alphabet (U05D0) kills script?

by ELISHEVA (Prior)
on Mar 08, 2011 at 22:37 UTC ( #892096=note: print w/ replies, xml ) Need Help??


in reply to Printing the first letter of the Hebrew alphabet (U05D0) kills script?

Well, it looks like I'm beginning to piece together an explanation. Since all three replies (ikegami, BrowserUK and kennethk are converging in the same direction, I'm going to summarize what I know so far on a new comment.

  1. The strange behavior seems to be a result of xterm treating certain byte sequences as terminal control sequences (see Re^4: Printing the first letter of the Hebrew alphabet (U05D0) kills script?here] for details), but none of us are sure what they are because, at first glance, the character sequence doesn't fit the normal 7-bit escape sequences that begin with ESC [

  2. However, some terminals support 8-bit control characters as an alternative to ESC [ (see http://rtfm.etla.org/xterm/ctlseq.html.

  3. It so happens that one of those 8-bit sequences is 0x90 (Device Control String). It also so happens that 0x5D0 has a byte representation of 0xd7  0x90. Perhaps xterm is seeing the 0x90 and instead of recognizing it as the second byte in a multibyte character, it understands 0x90 as the first byte in a device specific control string? As a result all of the output from Perl gets interpreted as some sort of device command until the next 8-bit control character shows up. That would explain why 0x05D0 (d7 90) stops output and a subsequent 0x05D1 (d7 91) or 0x05D2 (d7 92) resumes it. The 8-byte control characters fall in the range of 0x84-0x9f.

  4. In theory this shouldn't be happening on a utf8 terminal (xterm -u8). Xterm should know not to pluck 8-bit control characters from the middle of multibyte unicode characters. That makes me think that maybe what I'm seeing on xterm, version 235 is either (a) a bug in unicode parsing or (b) a bug in xterm's validation of configuration that allows two incompatible properties to exist (utf8 and 8-bit control sequence indicators). Interestingly, ikegami ran my test script on a later version of xterm and could not reproduce the strange behavior. This is suggestive of a bug that was found and fixed. But it could also mean that we simply have different xterm configurations.

There are still details to iron out. In particular - explaining the specific behavior I noted for each key combination, but I'm fairly satisified that this is in the right ballpark and relieved that this is likely a temporary version specific problem and not fundamental fact of life about Hebrew unicode and xterm.

I'd like to point out that every single piece of this was in some way suggested by one of the three people responding to this thread. To kennethk I owe thanks for making me look more closely at the behavior of other codepoints in the same vicinity as 0x5D0. browserUK put the final nail in xterm's coffin by giving me yet another way to prove that the symptoms were linked to destination of the output and not the generation within Perl. His comment about terminal parity got me looking more closely at what happens when you look at the pieces of a multibyte character. ikegami's testing on a later version of xterm made it clear that at least one later version of xterm managed to be well behaved even when wide character mode was off. Therefore any bad behavior was fairly viewed as a bug rather than a necessary evil.

What I really like about this thread is the way we've all been speculating and yet that speculation has lead to a proposed explanation.

Update: While I was writing my reply here, ikegami was coming to the similar conclusions. See Re^5: Printing the first letter of the Hebrew alphabet (U05D0) kills script?.


Comment on Re: Printing the first letter of the Hebrew alphabet (U05D0) kills script?
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://892096]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2014-07-13 03:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (245 votes), past polls