Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Perl 6 and Unicode Operators

by mugwumpjism (Hermit)
on May 31, 2005 at 22:07 UTC ( [id://462246] : perlmeditation . print w/replies, xml ) Need Help??

When I posted sample code from the Perl 6 Set module to the perl6-language, some may have balked at the below;

my $low = set( 0..4 ); my $odd = set( (0..4).map:{ $_ * 2 + 1 } ); say $low ∋ 4; # true say ($low ∖ $odd).members.join(","); # 0,2,4 say ( $low ∪ $odd ) ⊇ $low; # true

(PerlMonks seems to have eaten the Unicode characters in the above, so go to the post link to see the original)

As Perl 6 allows custom operators to be defined, and there were specific Unicode characters in the mathematical section for the precise operations being performed, I thought that they were a good fit.

But to what extent should this principle be carried on, and how can we ease the difficulties of switching back to the "right" symbols for things?

To further explore this idea, I updated the Unicode quickref with the operations available in the Set module, and made a cursory glance across various sections of Unicode for likely candidates for further use as operators.

Now, whether or not you agree that the longer standing conventions used by other fields of science, such as Mathematics, Philosophy and Logic should take precedence over the now standard conventions of Computer Science, these Unicode characters have a clear purpose, and I'd like to use them! So, this may end up in a non-core module, but this discussion will still be relevant.

There are lots of issues here;

  • Many of these unicode characters closely resemble other characters in many commonly available font sets (for instance, the Set difference character and the ASCII backslash), and in fact are often completely indistinguishable from their non-unicode counterpart. This is not actually a new problem; it's just the Birthday Paradox making it more a more obvious problem.
  • Some systems do not have the fontsets or capability to display and edit Unicode, though Perl will still be able to handle it intact on such systems.
  • People working on the same code base may have different preferences about seeing ASCII vs. Unicode, and it would be good to avoid this becoming a flamewar topic.

This is a constructive meditation, so keep the Green Hats on please. In particular, I'd like to avoid discussion about whether or not this is a good idea. You might have heard of APL and its use of strange glyphs for every control structure there is, but if you look at the Unicode Sliderule page 23, characters \x{2336} - \x{237a}, you'll see that APL's characters were in their own class of awfulness. That is a mistake I don't think anyone wants to repeat.

In addition to the above question, any other suggestions for appropriate Unicode characters to add to the quickref for review or humour would certainly help explore this topic!

$h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n";

Replies are listed 'Best First'.
Re: Perl 6 and Unicode Operators
by BrowserUk (Patriarch) on Jun 01, 2005 at 03:25 UTC

    Enabling the language to handle Unicode, and allowing the use of Unicode at the source code level, are two quite distinct decisions, each with their own sets of implications and caveats.

  • The former has implications for performance and complexity within the implementation, but does at least allow those who are either not ready or have no need for unicode to ignore it.
  • The latter has implications that go way beyond the language and it's implementation(s).

    If some aspects of the language require the programmer to use Unicode symbols to use it, it has the potential to prevent those without Unicode enabled tools from using those features. This goes beyond the immediate problem of how to type the code in your favourite editor. How does grep (or cvs or PM's super search or ... ) handle Unicode?

    There is also a pretty fundemental conflict with a world that still typically wants shared code to respect an 80 (or 72) character line width and eshews the use of the tab character for all the various reasons; and a world that allows or requires the use of Unicode for writing source code.

  • How will diff, patch, cvs, svn, etc. view %hash<<key>> and %hash«key>> and the other similar combinations?

    It seems to me that being the first language that allows, much less requires, Unicode symbols in source code is fraught with problems. Until the toolsets available to deal with source code catch up, it creates a big problem for the interchange and manipulation of that code.

    Of course, the toolsets won't become available until there is a need for them!

    Maybe fully Unicode enabled Perl6 and Parrot are just the things needed to allow the tools to be written, but does anyone foresee the existing tools being rewritten? Or the world at large adopting them if they were?

    When IBM gave APL to the world, they also had the wherewithall/muscle to produce special keyboards (pic Look for the red symbols. It doesn't do justice to the real thing which were huge, heavy, and a dream to type on. The first time I saw one it reminded me of nothing more than the ZX-81 keyboard! :).

    I don't think Perl6 could hope to do the same. Whilst emacs/vi configurations to enable unicode compositions can be made available, that still leaves a lot of tools with problems.

    The UK "went metric" back in 1972 and yet road signs are still in miles, beer is sold in pints, land is measured in acres not hectares and even many of those things which must now be sold in metric units (for legal reasons. Traders have been prosecuted for selling bananas by the pound!), are still the old Imperial things measured in crudely rounded metric units. 2" x 1" timber is sold as 50mmx25mm. 1/2" plaster board is sold as 13mm.

    As of 2000, the only non-metric units allowed were:

    • The mile, yard, foot and inch for road signs and related uses,
    • the acre for land registration,
    • the troy ounce for trading in precious metals,
    • the foot for aircraft altitudes,
    • the nautical mile and knots for sea and air traffic,
    • the pint for draught beer and cider and for milk when sold in returnable containers.

    33 years on and we're still in the mix and match world of Imperial and metric units.

    The current (latest, greatest) aim is to be fully metrified by 2009 (except for road signs!).

    I think that Unicode is going to take a similar amount of time to become pervassive. Despite the speed, and it is ever increasing, with which things change in the world of IT and computers, human beings have an innate reluctance to change. I think that it will take at least one "working generation" (say 35 or so years) for the greater majority of those in the industry at the levels of influence needed to impose such changes, to arrive at a concensus that Unicode is both desirable and necessary.

    Personally, I wish it were otherwise. I do however see considerable problems with using Unicode. Whether the difficulty of typing it; the problems in trying to convey Unicode symbols verbally; comparing Unicode symbols and their ascii-ized, n-graph equivalents; displaying them; printing them; interchanging code containing them.

    I guess the question comes down to whether Perl6 can stand the additional levels of 'reluctance to adopt' that these problems will create, on top of those already murmuring in the background?

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: Perl 6 and Unicode Operators
by mrborisguy (Hermit) on May 31, 2005 at 23:00 UTC

    On the one hand, I'd say what's the point of having Unicode if we don't use it? I would say the mathematical set class would be the intended use of unicode almost! Plus, most people who will use the Set class will have mathematical training, one would think, and that would be the reason they are using the set class, so I would think it would be a good thing!

    On the other hand, I don't think my CLI can read unicode, and I tend to program in vi, so I could see how it'd be a bad thing too.


      Depending on some variables, that may not be hard to fix. What is hard to fix is creating easy ways to input the chars. On my keyboard, even something as simple as << and >> requires three keys for the unicode version, but only two for the ascii version. (Compose + < + < = «). Not only that, but I have to explicitly enable my compose key.

      Almost paradoxically, it is much harder to enable unicode in something like PM, as the orignal poster discovered, and as I discovered when I hit preview on this node.

      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

        This is very true. I was actually thinking about the << problem the other day. I'm almost positive I can set up vi to automatically replace << with «, and I would bet that other major editors would be able to do the same. But yes, the problem comes when you add unicode characters, and suddenly I would have to add more substitutions to my vim files.


Re: Perl 6 and Unicode Operators
by dragonchild (Archbishop) on Jun 01, 2005 at 13:26 UTC
    While adding the way to avoid the Unicode characters is helpful, I was trying to point out that no-one has listed how to create those characters in vi or emacs. The Vim helpdocs are difficult at best on the topic, so docs/quickref/unicode should list the Compose-keys for both vi and emacs.

    Additionally, it would be helpful to list how to enable your vim/emacs version for Unicode, given that the Vim helpdocs are difficult at best.

    • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
    • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"

      I have the following code in my .emacs file

      (set-input-method "TeX") (quail-define-rules ((append . t)) ("\\\\" ?\\) ("<<" ?«) (">>" ?») ("\\``" ?“) ("\\''" ?”)) (toggle-input-method)

      Then C-\ will switch between a TeX input method (\subset --> ⊂, ...), and standard input method. This is an emacs-only solution though. One can use the latin-ltx.el file that defines the TeX mode to set up SCIM (Linux only) which will allow you to use the TeX input method globally.

      Update: In emacs you can see a list of possible input methods by typing: M-x list-input-methods

      Good Day,

Re: Perl 6 and Unicode Operators
by Anonymous Monk on Jun 02, 2005 at 01:14 UTC
    Many of these unicode characters closely resemble other characters in many commonly available font sets (for instance, the Set difference character and the ASCII backslash), and in fact are often completely indistinguishable from their non-unicode counterpart.

    This was one of my objections to Perl6: it gives would-be clever people yet more rope to hang the maintenance coder with. TMTOWTDII is a *bad* principle for writing maintainable code; and playing tricks with Unicode can only make things worse.

    I predicted that sooner or later, somebody would decide to use a bunch of confusing Unicode function names, in some obscure font, and I'd have to maintain the mess. Sure enough, Perl6 isn't even out yet, and people are already considering it. :-(

    Don't use Unicode for function names. That's my suggestion.

      Oh, I absolutely agree. All forms of technological advancement are bad and should be avoided.

      Humbug McFogey
      President, Buggy Whip Manufacturer's Assoc. of America

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://462246]
Approved by thor
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-02-25 03:09 GMT
Voting Booth?
My favourite way to spend a leap day ...

Results (23 votes). Check out past polls.