Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Curious about Perl's strengths in 2018

by Crosis (Beadle)
on Apr 12, 2018 at 06:57 UTC ( #1212716=perlmeditation: print w/replies, xml ) Need Help??

First of all, forgive me if this isn't quite the right place. I am a new user and am not entirely sure about the rubric for this area of the site, though I'm pretty sure this is the right place.

About a decade ago, when I was in my late teens and early twenties, I was very proficient in and eager to use Perl. Though it was a little idiosyncratic, it was certainly much less tedious to get things done in than C, which I had used earlier. Gradually I drifted away towards Python and now I use it for most things. I've since forgotten virtually everything I knew about Perl. I know that Python will still be obviously superior for, for example, most aspects of scientific computing (possibly excepting bioinformatics?) and machine learning, but where does Perl really shine these days? That goes equally for the more conventional Perl 5 as well as the newer Perl 6. Also, what are hot items on CPAN these days?

Replies are listed 'Best First'.
Re: Curious about Perl's strengths in 2018
by hippo (Canon) on Apr 12, 2018 at 08:08 UTC
    I know that Python will still be obviously superior for, for example, most aspects of scientific computing

    That's quite a bold assertion, my friend. I have to say that Python's superiority for that particular field is not obvious to me, if such superiority even exists. I'd be very interested to read how you know this to be true.

    I suppose we should also pre-empt the inevitable descent and point out that this comparison has been discussed at some length here before:

      That's quite a bold assertion, my friend. I have to say that Python's superiority for that particular field is not obvious to me, if such superiority even exists. I'd be very interested to read how you know this to be true.

      First of all, I was sloppy with terminology. I shouldn't have said "scientific computing" because that has a more restricted meaning than what I intended to refer to. I intended to refer to any domain where computers are extensively involved in gathering empirical knowledge, which is basically all over the place. The intended meaning was what scientific computing really means as well as statistics, machine learning, etc.

      Anyway, no other very high-level language has anything really comparable to all the advantages offered by numpy, scipy, scikit-learn, TensorFlow, Sage etc. in this area. There's still a lot of stuff I would only do in R (as much as I really don't want to) and Matlab is still pretty widely used, though I don't know much about it. A lot of big data stuff appears to be done in Java and fellow JVM language Scala. Raw C, C++ and Fortran are still relevant if you have a real need for speed. But in many respects Python has become the de facto standard for these things when special requirements don't need to be met. For example, I'm enrolled in the Coursera course Data-Driven Astronomy and it's all in Python. This is equally true of a number of other courses on said website. Results from Google are also indicative of Python's preeminence in data science / machine learning, more than I expected really. In academics, Python is replacing other languages for introductory programming courses and has replaced Common Lisp in the leading AI textbook Artificial Intelligence: A Modern Approach. And so it goes.

        You are quite right of course.

        The criterion for relevance is not just technology as such.

        Probably (I don't really know to be honest) you could do a lot of things that people use python's numpy & pandas for with perl's PDL, but then EVERYBODY uses numpy & pandas and if I want to learn about it I can even choose among several books if I want, while there is not a single book about PDL.

        So on a pure technical level perl may be in contention, but in reality it isn't.

        Python seems to be eating R's lunch these times and Perl is not even a contender anymore.

        It was not inevitable to play out like this, to me this is (in a way) Perl vs PHP all over again.

      The OP asks about Perl 6. Only the first of the four links does the same but that was back in 2001. I don't know much Python but perhaps more than 15 years later someone could give a better answer about Perl 6.

      Update:

      Thanks to Laurent_R for coming up with an answer: Re: Curious about Perl's strengths in 2018

      Ron

        There are only two or three real users here; probably more detractors; the majority being like me: I'm still interested in it and hope it becomes what it might but in my view it's a curiosity until it's much faster, a little less buggy, and the "6PAN" has grown. My lack of speaking about it is the intersection of general ignorance and politeness. :P

Re: Curious about Perl's strengths in 2018
by haukex (Canon) on Apr 12, 2018 at 12:49 UTC

    Perl is IMO still very powerful in the areas it has always been powerful in: text processing (regexes etc.), system administration, web development, and so on. Glancing over at *NIX help forums today shows that people are still getting a lot done with sed, awk, and perl one-liners. It's also come into use in the bioinformatics world, AFAIK for its power in handling text files. But Perl has also grown a significant amount:

    what are hot items on CPAN these days?

    Just to name a few "modern" ones:

    Although not exactly new, there are some other nice frameworks/libraries that IMO make Perl more "modern":

    Probably other monks can point out some of the "big ones" I've missed - see also Task::Kensho for even more modules. In general, I think even many of the modules that have been around for a really long time have matured to the point where they are more robust, have good test suites, etc. (e.g. Template, just to name one of many) - either that, or, in some cases, are now generally recommended against or deprecated.

    Various minor edits, and updates as indicated.

      Great recommendations.

Re: Curious about Perl's strengths in 2018
by haj (Friar) on Apr 12, 2018 at 20:00 UTC

    For me, the strength of Perl 5 in 2018 is that the code I am writing today looks quite different from the Perl 5 code I've written a decade ago, but the code I've written back then still works. The language evolves, and I can follow - in my own time. I don't have to convert my old classes to Moose but I can, and quite often I do because the code is so much more readable. I don't have to convert CGI programs to PSGI or to one of the web frameworks, but I can, and occasionally I do. And so on - other monks have provided plenty examples of modern Perl 5.

    With Perl 6, we have Inline::Perl5, and maybe soon-ish a Perl 6-aware IDE, so I can use all my old stuff in new programs where I exploit those Perl 6 features which are somewhat cumbersome in Perl 5 (notably grammars and concurrency). So the journey just goes on. Modern Perl 5 has been heavily influenced by Perl 6, so the chasm that existed between the two languages ten years ago is now a gap that can be jumped over. And again, I can jump if I want, I don't have to.

    This makes me absolutely relaxed with regard to selecting a programming language, and I like that.

      For me, the strength of Perl 5 in 2018 is that the code I am writing today looks quite different from the Perl 5 code I've written a decade ago, but the code I've written back then still works.

      This might never have occurred to me but it is the truth and a nice way to think about it. I wrote my first production script after 2 weeks of dabbling with Perl almost 20 years ago. It was strict-free 100% code smell but it worked exactly as it was supposed to—through brute force meets trial and error—and was something a (mostly) beginner could do.

Re: Curious about Perl's strengths in 2018
by Discipulus (Monsignor) on Apr 12, 2018 at 07:10 UTC
    Hello Crosis and welcome to the monastery and (back) to the wonderful world of Perl!

    Perl shines as always.. while I'm not the best monk here around to do comparison with other language for sue I can say that one of Perl strenght will be immutable and will persist over decades: it's a glue language.

    The job, using Perl, is quickly done. This means that you have a working thing very soon and you can grow it up as you want.

    Perl still rules in system administration: it is present in virtually all Linux systems and is easy to have on win32 systems: strawberry perl is almost equal to deal with as perl on Linux.

    Dont forget a lovely community! see Why does it seem as though Perl has the only community of friendly, non patronizing or demeaning, programmers? What is with every one else?

    You can be interested also in this Some Help for a Report About Perl and this dev.to article

    As new things in CPAN you can look at Plack PSGI (the new web Perl standard) and new webframework Mojolicious Dancer2

    I suggest you to see also MCE the many core engine for perl



    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      I don't deal with Windows very much but the point about system administration is well-taken. Maybe the situation is a bit analogous to being able to use vanilla vi? (I prefer to use Vim of course.) In the Unix world, at least, no matter where you go, vi is there. The same is true of Perl. I was also glad to hear that there's nice new shiny stuff on CPAN these days.

      However, not to be antagonistic, but the strength of "whipupitude" is something that Python and other, similar languages (at this point most prominently Ruby and JavaScript) certainly have to about the same degree as Perl. It's why I asked about more unique strengths of Perl in the present time.

      Community is certainly a more intangible but no less vital asset. I've seen "what is with everyone else?" and have occasionally been guilty of it. But for the most part I try not to be too much of a dillhole. When I'm dealing with intelligent people, as in a place like this, that goal becomes much easier to achieve.

        > unique strengths of Perl in the present time

        I wonder about the meaning of this: probably there are no unique strengths in Perl that are not present in other laguages. Perl has for sure the richiest regex system but almost all other languas have something working.

        Perl makes a big effort to support unicode but is, iirc, not the best laguage in this respect.

        Perl lets you to program in a OO style but does not force you to do so. Perl allows parallelism...

        I've already said: I'm not the best monk to do comparison between programming languages. Infact I took Perl many years ago and never felt the necessity to use something else: perl let me the possibility to do almost everything I imagine (and I have a big imagination) and do it the way I like. Probably this is the best thing I perceive about the language: freedom.

        Maybe this freedom born with perl itself in a gold age, before computer science was fagocitated by new economy. See this interesting post



        L*

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Curious about Perl's strengths in 2018
by stevieb (Abbot) on Apr 12, 2018 at 13:46 UTC

    Other Monks have already stated numerous positive aspects of Perl, but what I didn't see (hopefully I didn't overlook it!) I'd say that Perl is imho by far the easiest and best language to Unit Test your applications and libraries. Hands down, out of the half-dozen languages I frequently write in (including Python at $work), there's no comparison there. Considering I do Test Driven Development, that's a key aspect of a language for me.

    What's hot on CPAN? Raspberry Pi! ;)

      ++ I should have said this because I agree and it's something I work with every day. I guess I just take it for granted. :P

        Yeah, that's totally understandable.

        Even in berrybrew, which is a C# application and library, all of the unit tests are actually written in Perl :D

      I'm thinking about getting a Pi but idk what I'd do with it.

        I turned mine into a retro gaming console. Got it working, fired it up, and realized that the games of yesteryear are are terrible compared to what's out there now (IMHO) and it's been sitting since.

        My next idea is to run DNS and a web server on it. I'd like to set it up to just get DNS from Google and OpenDNS for most queries, but for known ad servers, to redirect those queries to a local webserver and have it set up to just return a 1x1 pixel transparent GIF to any and all requests. I'm thinking of doing it mainly just as an exercise, and partly to see how the sites that pop a "You're using an ad-blocker!" react to that.

        I built a full-blown indoor grow room environment controller automation system, a couple pan/tilt mechanisms for my RPi wildlife infrared cameras, various other small things, but the majority of my Pis are in my lab in various states of testing new ICs, sensors and random project ideas.

        I also have one that acts as my media centre, and as well as the media centre client systems, and one for retro gaming (I still love the old 8-bit NES games).

        can confirm ... bought a Pi 6 months ago, still in the box...

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Curious about Perl's strengths in 2018
by LanX (Archbishop) on Apr 12, 2018 at 07:25 UTC
    Perl is more "meta" than Python.

    It embeds different elements and paradigms of other languages, Python is certainly not tolerant here. (E.g. lambdas* allow only one statement, WTF? )

    It's also more DWIM and has a far better package system. And strict is a real advantage for debugging.

    I can often judge bad code by the look, while Python code always looks the same. (I call this the LaTeX effect.)

    CPAN is vast, the hottest thing is the fact that it includes everything.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

    *) anonymous subs in Perl.

      Since I mentioned one-liners, I will say that this is definitely one area where Perl outshines Python. Python technically can do one-liners but it's really not at all as capable as Perl would be in the same setting. Python's regular expression library is very good but it's annoying to write little scripts whenever you need to use it in the context in question. I used sed for the last instance where I needed a one-liner but sed isn't Turing-complete (that I know of—if it is, not in any way I want to use). awk is Turing-complete but much less capable than Perl otherwise.

        Yep, I like to compare Perl to an old song "I'm every woman it's all in me".²

        It combines most aspects of bash, lisp and C.*

        Unfortunately it's° badly managed, in the early 2000 it should have invaded the ecosystem of Bash and respond to the DSL needs of Ruby folks and incorporate a fast OOP system a la Moo

        That's probably the downside of having a very tolerant user base and giving a say to everyone.

        Pythonistas are in my experience not that tolerant, I had numerous encounters where they kept mobbing other languages and in the end it turned out I even knew their "own" language better.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Wikisyntax for the Monastery

        *) and heavily influenced other now main stream languages like PHP, JS and Ruby. Especially the latter is mostly Perl with "nicer" syntax and OO system.

        Roughly Ruby := Perl - Bash + Smalltalk

        °) in retrospective

        ²) Chaka not Whitney

      lambdas in Python are a particularly weak aspect of the language; I will readily agree with that. But overall Python is pretty multi-paradigm. In addition to the object-oriented features that are more well-known there are several standard modules that assist with functional programming: functools, itertools and operator. There's third-party stuff, too, but I haven't really looked into it.

      Perl strict is useful. I seem to remember always using both strict and warnings for anything other than one-liners in a pipeline. But it's much like ES6 strict in that the core language tends to be kind of lax and this needs to be addressed. It's not an issue all languages have.

      I don't know what CPAN as a package system per se is like these days. I will say that it annoys me when pip, the de facto Python package manager, overrides packages that are a part of the Ubuntu repository that get regular updates, because pip doesn't do automatic updates, but I don't know if that would be an issue with CPAN now.

      Re: code appearance, I never minded that Python requires very regular indentation. It's something everyone should do. But you may be right about code smells being more apparent if this structure is not required.

      I appreciate that CPAN remains very active and alive given the relatively diminished size of the Perl user base. That's a good thing. Unfortunately, for the purposes of machine learning, which is something I am very interested and involved in, it's not really there.

        I'm not that proficient in Python° and have some questions:

        • Is there anything remotely comparable to metacpan.org in Python? PyPi looks rather "simplistic".
        • Have you seen the ease of cpanm ?
        • Is there any community site comparable to perlmonks?

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Wikisyntax for the Monastery

        °) which might be our biggest problem answering your questions...

Re: Curious about Perl's strengths in 2018
by BrowserUk (Pope) on Apr 12, 2018 at 23:36 UTC

    In the betamax vs VHS of computer languages, Perl is VHS; but it lost anyway.

    Why? Is a deep, painful and complicated question, but sums up to:complacency. And the archives of this place are the evidence.

    An integral part of that, is also the plague of OSS in general:cliquism. Which to lesser Eng.lit. mortals means: the propensity to subdivide rather than reach a compromise; but with a twist.

    Whilst there are a billion(*) varieties of Linux, because noone group can decide what it should be; there is only one Perl, because no-one outside the ruling clique -- which includes the author, who is also on the outside looking in -- is allowed to suggest, much less make, changes. Stagnation rules.

    That false god of antiquity, namely 'backward compatibility', has been deemed all-powerful and sacrosanct, by the anointed -- for the most part by virtue of being around at the time -- few, with the result that even fixes aren't allowed to break even the most broken and ill-designed of existing 'working code'. The result is inevitable: stagnation.

    Circa 2005, perl's core code needed to be re-written for the modern world. I'm talking internally, not semantics. Less global state; less God objects; less magic; less 'only Perl can parse Perl; in a way, less TIMTOWTDI, but to a very small extent. Perl, circa 5.10.1 was nearly the perfect base from which to take over the world; but it was too hard. In testament to the vision and skill and genius of the original author, perl's internals proved impenetrable to refactoring at anything more than the most superficial of levels, so those in the-clique settled, for ongoing mediocrity. And here we are today.

    Some will condemn me -- and this -- as the ravings of an anti-Perl outsider; nothing could be further from the truth. Having once condemned Perl to being a "read-only language", I learnt to first hate; then accept the need for; learn to work with; then admire; and finally, love Perl. And I still do.

    It's only perl I am critical of. And if/when you come to understand that dichotomy, will you begin to understand the pain it causes me to write this. To write, what I consider to be, the truth about Perl.

    (*)292 more or less different distributions the last time I counted.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
      Circa 2005, perl's core code needed to be re-written for the modern world
      That was realised far before then. Chip's Topaz project to write a new, modern C++ perl core, kicked off in 1999, IIRC. It failed. Then there was another major attempt in 2000. That became perl6, and took 18 years. In the meantime, some of us have been keeping the existing perl5 core ticking over. We haven't been stopping anyone from forking the core and modernising it in Their Own Image - for example Kurila and cperl.

      We are however regularly accused of being Evil and breaking backwards compatibility on every new release. You only need to install a Marc Lehman mode from CPAN to get a virtual earful on how capricious the Porters supposedly are.

      Then we get in the neck from a second group of people on how evil we are for not rewriting perl in a shiny new modern way and be damned with old scripts still running.

      I think we have attempted to steer a reasonable middle course. We don't get it right all the time of course.

      Dave.

        You are the exception that proves the rule Dave. You engage. And I (amongst many others I'm sure) have learnt a raft of stuff from that.

        And I know there are a bunch of others that have done, and continue to do sterling work in the background.

        But there are also a bunch of high-tower sitters in the clique that sit on anything that is NIH.

        And as for forking -- that way lies 200 incompatible versions each with one main contributor.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
        Dave, with all due respect.

        It doesn't make Perl's management look efficient, if it takes about 20 ok lets say 15 years to implement experimental subroutine signatures, and this only including positional and no named parameters.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Wikisyntax for the Monastery

      That false god of antiquity, namely 'backward compatibility', has been deemed all-powerful and sacrosanct, by the anointed -- for the most part by virtue of being around at the time -- few, with the result that even fixes aren't allowed to break even the most broken and ill-designed of existing 'working code'. The result is inevitable: stagnation.
      I fully agree with this. IMHO, backward compatibility is usually a very good idea, but it becomes an obstacle to progress when it is maintained for too many years. We're speaking here about more than three decades. Nowadays, we really should no longer insist on keeping compatibility with Perl 4 programs written 25++ years ago. Strictures, warnings, lexical scope, lexical filehandles, autodie (or, at the very least least, autodie qw(open close)), and other such 'modern' features, should be the standard Perl behavior today, while keeping the no pragma_xxx possibility for programs written in the old style.

      Perl 6 is a bold attempt to do that. I know that many people here tend to frown upon it because they think it is breaking too much the backward compatibility. "Gosh, it took me so long to learn the sigil change when accessing an individual item in an array or a hash;" I can readily understand this point of view; very often, when I started to use it, I asked myself: "Damned, why did they change that (specific feature)? It worked well without that change." Well, maybe Perl 6 changed too many things. Or maybe not. It is very hard to say what the optimal amount of change might have been. So, in brief, I can understand this reaction very well, but I think it is wrong: after having used Perl 6 for quite a few years by now, I know that these changes do make sense and really make the language much more consistent. And, frankly, Perl 6 is really easy to learn for an old Perl 5 user like me (who is still using Perl 5 almost everyday at work and loves to do so).

      "... Less global state; less God objects; less magic; less 'only Perl can parse Perl; in a way, less TIMTOWTD..."

      He, our beloved supreme leader is an apostate?

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Curious about Perl's strengths in 2018
by eyepopslikeamosquito (Chancellor) on Apr 12, 2018 at 20:04 UTC
Re: Curious about Perl's strengths in 2018
by Laurent_R (Canon) on Apr 14, 2018 at 21:48 UTC
    but where does Perl really shine these days? That goes equally for the more conventional Perl 5 as well as the newer Perl 6.
    It seems that nobody has really answered this question with respect to Perl 6. Let me try to give a few very brief answers on that part of the question.

    Perl 6 has these to offer:

    • It is a clean, modern, multi-paradigm language; it offers procedural, object-oriented and functional methodologies;
    • Runtime optimization of hot code path;
    • Malleable language, with the possibility to define functions, operators, traits and data types (adding a new operator is as simple as writing a subroutine);
    • Improved, cleaner (and composable) regexes and built-in grammars (Perl 6 programs are compiled using a Perl 6 grammar);
    • Lazy evaluation and infinite lists;
    • Very powerful and high-level concepts for concurrency, parallelism and asynchronism for optimal use of multicore or multi-CPU architectures;
    • Gradual typing, function signatures (including positional and named parameters), optional argument type-checking and subroutine multiple dispatch (based on signatures);
    • a very powerful object model, with classes, roles, inheritance, subtyping, code reuse, introspection capabilities and meta-object programming;
    • Powerful metaoperators and hyperoperators for applying code to lists of items;
    • Unmatched unicode support;
    • excellent interoperatibility with other languages such as Perl 5 (making it possible to use Perl 5 CPAN modules), Python and C (and others);

      I've seen what can be done with the inline modules even in Perl 5. It gives me something to think about.

        I've seen what can be done with the inline modules even in Perl 5

        XS and Inline::C (which is essentially the same) are things that I find very attractive about perl 5, as accessing C routines by writing perl programs is far more appealing than accessing C routines by writing C programs.
        Of course, other languages also provide interfaces to C but I don't know how they compare with perl's C interface as I've not yet felt the need to investigate the alternatives.

        Cheers,
        Rob

      Perl6 looks like a language that has almost everything (ruby, python have), functional/OOP, meta programming, etc.

      I really wish Perl 6 takes off.

      • We apparently use a different definition of the word clean.
      • While still being way slower than anything else.
      • In combination with it's "clean" syntax, deceptively at the first glance kinda similar to another, older and well known language, this is bound to lead to impossible to decipher code.
      • While being an overkill most of the time and another ad hoc change from things people know, but with \d still matching insanely way too much.
      • Yeah, that's nice. If done right.
      • Aaaand ... the design is finally stable and ... erm ... implemented?
      • mkay
      • And there's maybe five people that understand it all.
      • With lovely line noisy syntax

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

Re: Curious about Perl's strengths in 2018
by raiph (Chaplain) on Apr 17, 2018 at 01:52 UTC
    Hi Crosis,

    I'm going to write provocatively about a series of topics, one comment per topic. My thesis for each will be that Perl is strong in some particular area in which Python is weak.

    This one's about text processing that involves text segmentation (i.e. character or substring processing) of Unicode text.

    In a nutshell Perl is a world leader in getting this right. The Perl 5 community has trailblazed supporting devs in dealing with all the fiddly details in as practical a manner as it could manage given its existing runtime and standard library functions. Perl 6 has trailblazed developing a new runtime and standard library that makes it easy for mere mortals to get the right results without having to have a degree in Emoji data science.

    In the meantime, the Python language, string type, standard library, and doc all entirely ignore the pieces necessary for getting text segmentation right per Unicode annex #29 (linked above) so it is all but impossible for any ordinary dev to correctly segment arbitrary Unicode text in Python 3.7.

    Feel free to ask what the heck I'm talking about if it's not obvious from what I've written and the link I provided.

    If you follow up on this comment I'll post another topic so we can keep things rolling. And if you comment on that, I'll post on another topic. I think I've got maybe 10 if you've got the stamina...

    Hi monks, hope you're all doing well.

      Alright, let's hear it. (And what about third-party add-ons for this functionality in Python?)

        To keep my commentary as short as can reasonably do the topic justice, I've sharply narrowed discussion to: characters in a Unicode string; Perl 6; and Python 3. See my notes at the end for discussion of this narrowing.

        What's a character?

        For a while last century, "character", in the context of computing, came close to being synonymous with an ASCII byte.

        But that was always a string implementation detail, one that involves an assumption that's broken in the general case. A character is not an ASCII byte unless you stick to a very limited view of text that ignores most of the world's text including even English text if it includes arbitrary Unicode characters, eg tweets which may look English but are allowed to contain arbitrary characters.

        For a while this century, "character", in the context of contemporary mainstream programming languages and developer awareness, has come close to being synonymous with a Unicode codepoint.

        Unfortunately, assuming a codepoint is a character in the ordinary sense is again a broken assumption in the general case. Even if you're dealing with Unicode text, a character does not correspond to a Unicode codepoint, unless you continue to stick to a very sharply limited view of text and characters that again excludes arbitrary Unicode text.

        "What a user thinks of as a character"

        So, just what is a "character" given some Unicode string?

        If we're talking about Unicode, it's helpful to consider Unicode's precisely chosen vocabulary for describing text, and in particular, characters.

        Unicode's definition of "what a user thinks of as a character" translates, in digital terms, to it being a sequence of codepoints selected according to rules (algorithms) and data primarily defined by Unicode.

        So a character might be just one codepoint -- or it might be many.

        Text processing can't properly distinguish what characters there are in a text string unless it iterates through a given text string, calculating the start and end of individual characters according to the relevant general Unicode rules and data (and locale and/or application specific overrides).

        This latter reality -- an individual character can be comprised of multiple codepoints -- is why a character=codepoint assumption is a 21st century mistake that's analogous to the 20th century one of assuming character=byte.

        The codepoint=character assumption allows for fast indexing -- but it's increasingly often wrong, leading to broken code and corrupt data.

        What Perl (5 + 6) and Python (2 + 3) think of as a character

        Armed with the knowledge that Unicode uses the word "grapheme" to denote "what a user thinks of as a a character" rather than bytes and codepoints, one can begin to get some sense of the level of support for this level of character handling in any given programming language by searching within its resources for "grapheme".

        Google searches for "grapheme+<prog-lang-web-home>" and "grapheme+<prog-lang-goes-here>", with commentary about the state of things when I did these searches in 2018:

        • grapheme+perl.org and grapheme+perl yield interesting reading that highlights Perl's world leading Unicode support (including support for processing characters aka graphemes).

        • grapheme+perl6.org and grapheme+perl6 yields a different set of matches, with some overlap with the plain perl search, but this time highlighting Perl 6's leadership within the Perl world and outside it when it comes to processing graphemes.

        • grapheme+python.org yields nothing but "Missing: grapheme" matches. In other words, there are zero matches. No matches in the PEPs that drove Python 3's design, including PEP 393 -- Flexible String Representation ("There are two classes of complaints about the current implementation of the unicode type"). The Python 3 Unicode HOWTO? No matches. The entire python.org? No matches.

        • grapheme+python shows the reality for python. The opening "Did you mean: graphene python" is perhaps just amusing. Likewise the Grapheme Toolkit that allows "visualization of complex molecular interactions ... that is the most natural to chemists". The relative lack of upvotes (and no comments) on the /r/python post introducing the grapheme library is a bit more telling. Likewise the paucity of useful links in general. The bug report that ends in February 2018 with "We missed 3.7 train. ... I have many shine features I want in 3.7 and I have no time to review all. Especially, I need to understand tr29. It was hard job to me." suggests a striking lack of focus on this issue in the Python community at large.

        An example that's No F💩💩king Good

        Given the discussion thus far, it should come as no surprise that the built in string type, functions, and standard libraries of both Python 2 and Python 3 will yield the wrong result for string length, character indexing, and substrings, and functionality that relies on those results, if A) what you're interested in is character=grapheme processing as contrasted with character=codepoint processing and B) a string contains a grapheme that isn't a single codepoint.

        One fun way to see this in action is to view Patrick Michaud's lightning talk about text processing that's No F💩💩king Good. If you don't have 5 minutes, the following link takes you right to the point where Patrick spends 30 seconds trying Python 3. Of the three simple tests used in his talk it gets two "wrong".

        Part of the fun has emerged since I first wrote this perlmonks comment. It turns out that this example may itself be No F**king Good in a manner not at all intended by Jonathan Worthington who wrote the presentation or me when I originally included it here. Prompted by a reader who challenged several aspects of this post, including this one, my brief investigation thus far suggests that the specific example of a D with double dots is actually a "degenerate case" -- one that "never occurs in practice", or at least one that will generally only occur in artificial/accidental scenarios such as the test in the video.

        (It looks like it may have been naively taken from the the "Basic Examples" table in Unicode annex #15 on the mistaken assumption it's not degenerate when instead (perhaps) it's in the table as an example in which normalization to a single character is not appropriate because that character doesn't appear in practice. If you, dear reader, can confirm or deny its degenerate nature, please comment.)

        A better example?

        Does the above No F💩💩king Good example mean the thrust of this post -- about character=grapheme vs character=codepoint -- is essentially invalid? No. While D with double dots may be an especially poorly chosen example, the problem does occur for a large number of non-degenerate characters.

        Consider the reported length of the string "षि". This string contains text written in Devanagari, one of the world's most used scripts. When you try selecting the text using your web browser, how many characters appear to be inside the quotes? For me it's one.

        The code `print(len(unicodedata.normalize('NFC',u'षि')))`, when run in Python 2 and Python 3, returns 2. The code `say 'षि'.chars` when run in Perl 6 returns 1. The code is simpler, and, much more importantly, correct.

        Or is it correct? To further complicate matters, the number of graphemes in a string sometimes depends on the particular text, the human looking at it (this is not a joke), and the application context! For further insight into this read a reddit exchange.

        Perl 6 has not yet addressed tailored grapheme clusters in its core. So, while it's much easier to use than Perl 5 and Python 3 for many cases, it's still got work to do.

        Does it matter that Perl 6's character and substring accessing time is O(1)?

        If you watch the whole of Patrick's talk you'll see he covers the point that Perl 6 has "O(1) substring, index, etc.".

        But for most things, other langs are faster than Perl 6 -- a lot faster. So does O(1) indexing matter?

        Imo it does. It took years to get the architecture of P6 and the Rakudo compiler right but the initial decade of design work is now in the past. NFG, along with all the other innovative and/or difficult main elements in P6 and Rakudo, are in place and getting steadily better and faster.

        If character processing in general matters, then presumably O(1) character indexing, substring processing, and regexing matters. And if so, the Perl 6 and nqp languages, and Rakudo / NQP / MoarVM compiler stack, are all in a great place given that they're the first (and I believe only) programming languages and compiler stack in the world with O(1) performance.

        (As far as I know the indexing, substring and regexing performance of Swift and Elixir -- the only other languages I'm aware of that have adopted "what a user thinks of as a character" as their standard string type's character abstraction -- is still O(n) or worse.)

        What about third-party add-ons for this functionality in Python?

        The primary source of guidance, reference implementations, and locale specific data related to Unicode, including annex #29, is ICU (code in C/C++ and Java) and CLDR (locale specific data related to text segmentation, including of characters). Many languages rely on bindings/wrappers of these resources for much of their Unicode support.

        In the Python case the PyICU project is a binding/wrapper with a long history that credibly (to me, just an onlooker) claims production status.

        I'm unsure about the status of other projects. The pure Python uniseg includes a PR and reply to that PR from this year but hasn't been updated since 2015, since which Unicode has substantially updated annex #29 in ways that require conforming implementations to change. Another simpler but newer library is grapheme as introduced in this blog post. In some ways this is the most promising library I found. That said, it's currently marked as Alpha status.

        Note that neither PyICU nor uniseg nor grapheme provides anything remotely like the ergonomic simplicity and deep integration that the Perl 6 language provides for character=grapheme indexing, substring handling, regexing, etc.

        Furthermore, ICU, and thus any modules that build directly on its code -- which I believe is true of PyICU, uniseg and grapheme -- does not provide O(1) grapheme-based indexing, substring and regexing performance. (cf the grapheme library's comment that "Execution times may improve in later releases, but calculating graphemes is and will continue to be notably slower than just counting unicode code points".)

        Conclusion

        Perhaps my overall point has gotten lost as I've tried to provide substantive detail.

        The bottom line is that Perl has long been a leader in text processing capabilities and in that regard, as in many others, it's in great shape, including and perhaps especially in how it compares with Python.

        Notes

        Sorry it took me so long to spot your reply and write this comment. (And because of that I'm not going to simultaneously start another sub-thread about another topic as I originally said I would if you replied. Let's see if you spot this reply and then maybe we can wrap this sub-thread first and only start another if we're both interested in doing so.)

        To keep my commentary as short as can reasonably do the topic justice, I sharply narrowed discussion above to characters in a Unicode string; Perl 6; and Python 3:

        • I only discuss one very narrow topic, namely indexing characters in a Unicode string per "what a user thinks of as a character" as discussed in Unicode annex #29.

        • Of the Perls, I only discuss Perl 6 even though Perl 5 is much more mature, with broad and deep Unicode support in terms of userland modules, and is generally much faster than Perl 6. Note that the two Perls can be used together to produce best-of-both-Perls solutions.

        • Perl 6 has world leading character handling features with outstanding ergonomics and O(1) performance. This includes sped up and simplified character indexing, substring processing, and regexing. (Perl 6 has other great Unicode features too but I don't discuss these, or indeed the substring or regex features. It all builds on the fundamental character abstraction used by Perl 6 and that's all I discuss.)

        • I contrast Perl 6's support for Unicode characters with Python 3's. Python 3 is considered by many to have adequate Unicode support, on par with most mainstream languages, and significantly better support than Python 2's, especially/ironically with regard to characters. So if you like Perl 6's advantage over Python 3 then you should like Perl's advantages over most mainstream languages, including both Pythons.
Re: Curious about Perl's strengths in 2018
by bliako (Deacon) on Apr 14, 2018 at 12:20 UTC

    Programming languages can be compared in many different ways, of course. But there is a metric which is the most important. Economy: capturing, harnessing and training programmers (much like mustangs) or just mass-producing them at university breeding stables (much like work-horses). Then it is the ease of breaking down programming tasks and their distribution to teams all over the world (in order to exploit local wage differences etc.). Also, risk of bugs, cost of maintenance and refactoring. In this metric Perl is somehow inferior to others. But strictly-typed, OOP languages will always win here anyway.

    The second most important metric, again based in my very limited experience and less on knowledge, is how much it allows you to build on the work of others: 3rd party libraries. CPAN is full of high quality modules in standard more or less format (wrt doc/test/install). Last night I needed a tree implementation. LOTS of trees in CPAN! Add to this excellent package managers, for example cpanm, and excellent multi-environment managers like Perlbrew. And excellent forums like PerlMonks. In this metric Perl is in par with every other contestant if not winning single-handedly.

    Third metric, built-in data structures. The Hashtable was first given to the masses via Perl (ok, awk has it much earlier but we are not talking about masses exactly here).

    And the masses loved it and swear by it ever since. What an impact by Perl! But who remembers it and who cares?

    Python picked it up too but later (correct me if wrong). Then Java (and JS/ES). But C++ never dared to do that then, when it did matter. Today they introduce any nilly-willy idea they read in Cosmopolitan every couple of weeks to the point of ridiculousness (and the breaking of my systems). Who needs knee-jerk reactions?

    So, in this metric Perl won but victory has no meaning. Now all contestants offer this and some of them overdo it (e.g. Java). So all equal here I believe.

    Lastly, there is the question of Optimisation of practices, resource allocation, distribution galaxy-wise if possible. Bear with me for a moment...

    There is a top-down approach to it where a path is followed to optimality, much like climbing a mountain. And there is an "anarchic" way: unleash a billion ants to climb to the honey at the top each following a random path. Very useful when we are not sure where the highest peak is. And have lots of ants. Darwin expressed the idea of the survival of the fittest. Something that is often ovderlooked is that it requires a massive population breeding and eating each other.

    Genetic Algorithms, as in AI, model exactly that but in Computer Systems. And Genetic Programming takes it even further to creating optimal (computer) programs as a result of breeding and eating each other along with random mutations here and there. (Do not hurry to say that GP did not fly. Neural Networks died and resurrected many a times, each time stronger.)

    It is my opinion that Optimisation should be the first concern in the mind of any person who is a "scholar" in the sense that he/she cares to see a bit further than how the daily bread is brought to the family table. To any manager and "chief executive" (gosh!), to any government official (gosh!) should be the first in their mind. Alas it is not. Because the (prevalent) economic system dictates that it is short-term profit maximisation which matters. De facto: Bonuses are distributed every year, CEOs and politicians are elected every five (~~). What incenctive do they have to think long-term? Like 30 years from now? 50 years ago the "short-term" was 30 years, today is maybe 18 months. (Interesting side-note: the taking of GKN which is a supplier for Airbus Airbus warns over Melrose's £8bn GKN takeover : Airbus, said the industry was not suited to "short-term financial investment")

    I digress. Anyway, Optimisation should be first in our mind when we judge anything that has to do with our future. And because nobody is sure about which path to take to the highest peak (which we can not see anyway) I favour the second model, the "anarchic" bottom-up massive unleash of tha ants. Maybe with educated top-down guesses and heuristics and of course with a set of rules which are HUMANE. No need for the deadly competition of today, neither the marginalisation of people and ideas.

    And here is where the Programming Languages come in: what is their role in this Optimisation? Now this is a Metric!

    This is a Digital era and there is no going back (unfortunately i say). The spade and the steam-engine of our times is Computer Systems and Computer Programs are what move them. And so Programming Languages are important part of Optimisation. But we need to ensure that the competition of ideas and toolsets is as wide as possible in order to have better chance to arrive to good solutions - to evenly visit the landscape so-to-speak.

    One of Perl's underlying (and I guess uncompromising) principle is There's more than one way to do it (edit: a point many fellow Monks mentioned already here and also I mentioned it in another comment which is cited here). It fits exactly in the model of Optimisation I have just described. It is very important. And in my opinion it is Perl's most important strength for 2018 until 2058 and 2258. Please note: underlying principle and not just a random side effect. You are what you program with. Maybe Perl will die out one day but the spirit survives and also within the walls of this Monastery.

    Onwards Brothers and Sisters! The spirit of Optimisation is the spirit of Perl.

    Contrast and compare that to others notably Java, C++ and Python. Btw the latter I distaste so much - both language, creators and users - I thought it unfair to make any further comment about it.

    bliako

      "In contrast [to Tim Toady] part of the Zen of Python is, "There should be one — and preferably only one — obvious way to do it." (from the article on "There's more than one way to do it", link above).

      Worlds colliding...

      bliako
      Contrast and compare that to others notably Java, C++ and Python. Btw the latter I distaste so much - both language, creators and users - I thought it unfair to make any further comment about it.

      That may be the case but as far as the evolutionary computing you talked about is concerned, unfortunately there doesn't appear to be anything in CPAN comparable to DEAP.

      (Side note: evolutionary computing cannot generally be considered to be an optimizing approach, despite contrary language (including something I'm about to mention) and there is at least one approach in the field where the evolution is in memes rather than genes and so where no one breeds and no one dies, namely particle swarm optimization.)

        At the end all these languages are Turing complete and will eventually arrive at the same tape square. What interests me is the spirit and the inspiration. Pretty metaphysical I agree... A comment to your side-note: I am not sure what is more "scary": genes dying out or ideas? Ideas I should think.

        I have whipped up some code to get me started with Perl's evolutionary computing toolkit.

        #!/usr/bin/env perl # Brief and dirty attempt to solve a system of simultaneous equations +(2) # using Genetic Algorithms in particular CPAN module # Algorithm::Evolutionary # Perl module for performing paradigm-free evolutionary algorithms # by J. J. Merelo, jmerelo (at) geneura.ugr.es # (parts of my program were copied from manpage) # # The toy problem here is to find INTEGER solutions to the system of e +quations y-x=2 and y-2x=11 # wrt x and y. # Our 2 genes are 'x' and 'y'. We encode these as 8-bit integers # 7+1sign bit. The algorithm will mutate/crossover etc the bit string +of each member of # the population. Then it will evaluate how well the genes of each mem +ber of the population # solve the problem at hand. This is called the fitness. The fittest g +enes survive and the # rest are discarded, with some probability. # Author: bliako # Date: 16/04/2018 use strict; use warnings; use Algorithm::Evolutionary::Experiment; use Algorithm::Evolutionary::Op::Easy; use Algorithm::Evolutionary::Op::Bitflip; use Algorithm::Evolutionary::Op::Crossover; my $num_genes = 2; my $fitness = sub { my $individual = shift; my $genes = chromosome2genes($individual->Chrom()); return calculate_discrepancy($genes); }; my $m = Algorithm::Evolutionary::Op::Bitflip->new(2); # flip this numb +er of bits randomly my $c = Algorithm::Evolutionary::Op::Crossover->new(2); # crossover wi +th 2 points # every iteration applies the above operations to the population along + with a fitness function # and selection rate (prob of good genes to survive, lower means more +"bad" genes enter the next generation) my $ez = new Algorithm::Evolutionary::Op::Easy $fitness, 0.4, [$m,$c]; my $popSize = 500; # population size, each individual in this pop has +a chromosome which consists of 2 genes my $indiType = 'BitString'; # the chromosome is a sequence of bits as +a string my $indiSize = 8*$num_genes; # 8 bits per gene my $e = new Algorithm::Evolutionary::Experiment $popSize, $indiType, $ +indiSize, $ez; my $populationRef; my $previous_fitness = 0; my ($current_fitness, $best); while(1){ $populationRef = $e->go(); $best = $populationRef->[0]; print "Best so far: ", $best->asString(), " (", individual2string( +$best),")\n"; $current_fitness = $best->Fitness(); if( $current_fitness == 0 ){ print "bingo!\n"; last } #if( ($previous_fitness - $current_fitness) == 0 ){ last } $previous_fitness = $current_fitness; } print "\nI tried to solve the system of equations: y-x=2 and y-2x=11. +The solution should be x=3, y=5\n"; print "Final solution found: ".individual2string($best)."\n"; exit(0); sub individual2string { my $individual = $_[0]; my $genes = chromosome2genes($individual->Chrom()); my $fit = calculate_discrepancy($genes); return genes2string($genes) . " -> discrepancy=" . $fit } # interpret an array of genes wrt our problem, i.e. an x and a y sub genes2string { my $genes = $_[0]; return "x=".$genes->[0].", y=".$genes->[1]; } # convert a huge bit string into an array of genes # the array to place the genes in is given sub chromosome2genes { my $achromosome = $_[0]; # chromosome bit string containing all ge +nes as 10101 my @retgenes = (0)x$num_genes; # convert a chromosome which consists of genes which consist of bi +ts(alleles) # into a set of numbers to be applied to our problem. # each chromosome below consists of 2 genes which consist of 8 bit +s (1sign+7) # these 8bits are interpreted as integers in +-127 range (which is + enough for our problem # however if solution involved bigger numbers we need to increase +range/bits) my $i=0; while( $achromosome =~ /([01])([01]{7})/g ){ my $sig = $1 eq '1' ? -1 : 1; my $g2 = $2; # Here is how a sequence of 8bits is converted to integers. 1s +t bit is sign. # I am sure there is a better way using pack. my $g = 0; my $j = 1; map { $g += $_*$j; $j*=2; } split(//, $g2); $g *= $sig; $retgenes[$i++] = $g; #print "$g2->num=$g\n"; } return \@retgenes } sub calculate_discrepancy { my $genes = $_[0]; # Our problem is to solve the simultaneous equation: y-x=2 and y-2 +x=11 # where genes[0] -> y, genes[1]->x my $e1 = $genes->[0] - $genes->[1] - 2; my $e2 = $genes->[0] + 2*$genes->[1] - 11; # we calculate discrepancy but we need to return fitness: return -($e1*$e1 + $e2*$e2); }
Moose (and friends)
by Ea (Hermit) on May 11, 2018 at 11:26 UTC
    Late to the party - my 2 ducats.

    I was at London Perl Workshop showing my code off to a Python-using data scientist at lunch. What caught her eye wasn't my obvious genius, but Moose - with only half a day of Perl under her belt and she could already clearly understand what my objects did. She said I can understand that, jaw slightly dropped. Objects in Perl are easy and quick, meaning powerful.

    Scientific programming - oooh you stirred up a hornets nest there :)
    The great thing is that there are over 500 modules on CPAN dealing with Math, Statistics and Science. The problem is that there are over 500 modules on CPAN dealing with ... and there is not a lot of guidance on what to use. If you're looking for advice, I'd start with Perl 4 Science and post questions on The Quantified Onion.

    One advantage that hasn't been mentioned, have you looked at the cost of conferences? Perl conferences are less than half the price of PyCons, in my experience.

    Best of luck

    Sometimes I can think of 6 impossible LDAP attributes before breakfast.

    YAPC::Europe::2018 — Hmmm, need to talk to work about sending me.

    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1212716]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2018-12-13 15:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many stories does it take before you've heard them all?







    Results (62 votes). Check out past polls.

    Notices?