Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Perl in data science: could a grant from Perl foundation be useful?

by zubenel0 (Sexton)
on Feb 18, 2020 at 19:15 UTC ( #11113122=perlmeditation: print w/replies, xml ) Need Help??


Recently I was thinking about if it is possible to make Perl a more attractive option for data science. I know that some great initiatives exist like RFC: 101 Perl PDL Exercises for Data Analysis or RFC: 100 PDL Exercises (ported from numpy). On my part, I will try to write a blog post with a particular machine learning task I have chosen. Nevertheless, as Ovid wrote falling short in data science field is a significant drawback of Perl. How to fix this?

What I thought about as a way to to proceed could be a grant from Perl foundation. It could work only if it would be possible to find someone interested in a project related to Perl and data science and capable to do it. IMO one of the solutions that could help would be to write a book on How to use Perl in Data Science. Again, this idea is not mine as it was mentioned in perlblogs as a desire to have a new PDL book. Maybe with a help from Perl foundation such a project could encompass even more than PDL and include several other modules suited for data science.

Another interesting idea that I have encountered was to create Perl/XS graphics backend as there is a need to have graphic library which can create 2D/3D chart easily - see the comments on perlblogs. Unfortunately, I know very little about this but I guess that it might be a very hard task... So these are just a couple of examples but actually the main issue is if it is feasible in general - to have a grant for data science using Perl? What do you think? Do you know someone that could be interested in it? Or do you think that this approach is flawed and have some other suggestions?

  • Comment on Perl in data science: could a grant from Perl foundation be useful?

Replies are listed 'Best First'.
Re: Perl in data science: could a grant from Perl foundation be useful?
by thechartist (Monk) on Feb 19, 2020 at 00:37 UTC

    There is nothing wrong with using Perl for data analysis if you know what you are trying to do. There are a number of options for conducting statistical analyses from a classical POV. Bayesian methods are sadly lacking right now, but you can always call out to R for that.

    If you just want to unleash algorithms on vast quantities of data (of unverifiable quality), Perl has some "machine learning" options, but they are limited, as are tutorials.

    A better use would be to see how machine learning algorithms could improve Perl on systems that do not get a lot of testing. That is what I am focusing on right now. I expect I'll need to write bindings to various C++ libraries, which is not the most appealing of options.

      That seems pretty interesting what are you trying to do, could you elaborate a little more? How do you think machine learning could benefit in improving Perl? As a personal experience, sometimes I encounter troubles by using Perl on Windows and on Linux it works just fine...

        I am trying to formalize a model/process that can ID distributions where a patch on one is highly likely to also work on another. Retrospectively, we can patch multiple models and reduce time bug hunting. Prospectively, we can configure build scripts to eliminate certain problems.

Re: Perl in data science: could a grant from Perl foundation be useful?
by Anonymous Monk on Feb 21, 2020 at 23:17 UTC
    No, I do not see anything here that would benefit from "throwing money at it." To accomplish their science objectives, data scientists have many options, of which "Perl is only one of now-many." The scientist's obvious (and only) goal is to get to their finish-line of an accepted publication, not per se "to use Perl" to get there. Anyone who needs to know about PDL already does, and if they elect to use it they will do so without (financial) provocation. The Foundation should keep its money in its pocket.
      That's a fair point and I have doubts if financial support would have a desired effect too. However, I think that in most cases the scientist is not forced to use one particular programming language as Python, R, SAS, MATLAB, Julia, Perl or something else. The problem is not that some particular tool is necessary or not. The problem is that PDL and other Perl modules are not known well enough as options and are not attractive enough for people that could use them. For example I only recently discovered that such thing as PDL exists and did not know this option before.

      To summarize, I think that the need for usage of a particular programming language in data science is much more limited than its actual usage. The goal might be to make Perl a more attractive option not only for those who need it and cannot avoid it but also for those who do not need any particular option and are considering which of these to choose in order to achieve their goals.

        Agree that researchers are not forced to use one particular language. The choice of programming language and tools is almost free when working alone. However, when working in groups that share code it is more difficult to freely choose. Nowadays, it can be almost impossible be the only one using Perl while all others code in Python or R, unless the particular workflow and local culture allows it. For example if only exchanging data chunks in some standard format, instead of code. I believe that the major barrier to widespread use of Perl is that it has largely been displaced by Python and R in the general culture of scientific circles. Technical details or idiosyncrasies are weak arguments often used in discussions, but I believe the real thing is just cultural.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://11113122]
Front-paged by Discipulus
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2020-04-08 15:26 GMT
Find Nodes?
    Voting Booth?
    The most amusing oxymoron is:

    Results (44 votes). Check out past polls.