Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: Printing the first letter of the Hebrew alphabet (U05D0) kills script?

by ELISHEVA (Prior)
on Mar 08, 2011 at 20:19 UTC ( #892071=note: print w/ replies, xml ) Need Help??


in reply to Re: Printing the first letter of the Hebrew alphabet (U05D0) kills script?
in thread Printing the first letter of the Hebrew alphabet (U05D0) kills script?

What happens when you run the script with wide-characters explicitly turned off, i.e. in an xterm launched with xterm -u8 +wc? For me, wide character xterm windows show all output. It could be that you aren't seeing the same results because your system is configured by default to have wide characters turned on and you need to explicitly turn it off to get the results I'm getting. I'm only seeing STDOUT and STDERR disappear when wide characters are turned off (see sample output in OP).

It could be a bug, but I'm beginning to think that it may in fact be a "feature" left over from the pre-unicode days of the computing world. xterm, at least in in version 235, seems to think certain byte sequences are escape sequences meant to control the terminal. See my latest reply to kennethk.


Comment on Re^2: Printing the first letter of the Hebrew alphabet (U05D0) kills script?
Download Code
Re^3: Printing the first letter of the Hebrew alphabet (U05D0) kills script?
by ikegami (Pope) on Mar 08, 2011 at 20:43 UTC

    What happens when you run the script with wide-characters explicitly turned off, i.e. in an xterm launched with xterm -u8 +wc?

    No difference whatsoever.

    seems to think certain byte sequences are escape sequences meant to control the terminal.

    That may be.

    The things is, those normally start with ESCape (^[). UTF-8 doesn't produce anything that contains ESC except for ESC itself. Other control character respected by terminals are also found in the ASCII range and thus not produced by UTF-8.

    I don't know much about terminals, and less about xterm. I didn't even have xterm installed until this came up.

    You mentioned something about an "Xemacs shell". Is that a variable that can be eliminated?

      You mentioned something about an "Xemacs shell". Is that a variable that can be eliminated?

      In this case the xemacs shell was being used as a control rather than something to be eliminated. One way of assuring myself that this was xterm specific behavior was to run the script in an alternative shell and see what happened. As it turned out, there was no disappearing output in the xemacs shell (which is really just a file buffer pretending to be a shell). I also dumped the output to a file instead of the terminal (as suggested above by BrowserUK and not surprisingly it was all there - no disappearing output. This really does seem to be an xterm problem.

      those normally start with ESCape (^[).

      Like you, that was my first assumption too, but googling around I see that there does appear to be some overlap between UTF-8 and xterm escape sequences. For example,

      Under normal mouse mode, positions outside (160,94) result in byte pairs which can be interpreted as a single UTF-8 character; applications which do treat their input as UTF-8 will almost certainly be confused unless extended mouse mode is active. Source: http://invisible-island.net/xterm/ctlseqs/ctlseqs.html#Mouse%20Tracking

      I'm not sure how that explains what I'm seeing, but that may not be the only case of overlap.

      I don't know much about terminals, and less about xterm. I didn't even have xterm installed until this came up.

      Wow. Many thanks for the effort you have put into this!

        Under normal mouse mode, positions outside (160,94) result in byte pairs which can be interpreted as a single UTF-8 character;

        For there to be an issue, a sequence of UTF-8 characters has be interpreted as an escape sequence, not the other way around.

        From higher up in that linked document comes this:

        The xterm program recognizes both 8-bit and 7-bit control characters. It generates 7-bit controls (by default) or 8-bit if S8C1T is enabled.

        It proceeds to say 0x9B and ESC [ are equivalent, for example.

        More relevant, it says 0x90 and ESC P are equivalent. U+05D0 is 0xD7 0x90 in UTF-8.

        Are these equivalent for you?

        perl -e'print "\x1B[31m", "foo", "\x1B[0m", "bar", "\n";' perl -e'print "\x9B31m", "foo", "\x9B0m", "bar", "\n";'

        Perhaps you can tell xterm to stop recognising the "8-bit" codes.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://892071]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2014-07-26 11:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls