Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number


( #480=superdoc: print w/replies, xml ) Need Help??

If you've discovered something amazing about Perl that you just need to share with everyone, this is the right place.

This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)

Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").

User Meditations
use Memoize;
2 direct replies — Read more / Contribute
by Anonymous Monk
on Jul 16, 2018 at 15:18
    I was porting a script to a module and noticed it kept getting slower. The script could initialize its expensive data structure once at the top and be done with it, but in order to encapsulate, the module was calling the function several times. I remembered the core module Memoize and added one line to the top of the program and now it runs fast again, 4x faster than without Memoize!
    use Memoize; memoize('some_sub');
    Only 1.5 seconds to start a program that was taking 6 seconds!
Find the shortest word in the English Language with: a b c d e f
9 direct replies — Read more / Contribute
by usemodperl
on Jul 03, 2018 at 20:41
    Edit: Not going to edit this node because of replies but Eily noticed there is an extra single quote at the end of my port :-/

    This recent post in r/programming poses an interesting question for a bit of golf: "What is the shortest word in the English Language which contains: a b c d e f?" Some Junk™ code was posted to figure this out and my Perl version is a bit shorter. I know you perl better than me, if you do, and can see how else to do this:

    The Junk Code:
    sorted([w.strip() for w in open('/usr/share/dict/words', 'r').readline +s() if set(list('abcdef')).issubset(set(list(w.strip())))], key=lambda x: +len(x))
    My Perl Port:
    open$:,"</usr/share/dict/words";while(<$:>){next unless/(?=.*a)(?=.*b) +(?=.*c)(?=.*d)(?=.*e)(?=.*f)/i;push@_,$_}@_=sort{length$a<=>length$b} +@_'

    For your convenience:

    Print the list:
    open$:,"</usr/share/dict/words";while(<$:>){next unless/(?=.*a)(?=.*b) +(?=.*c)(?=.*d)(?=.*e)(?=.*f)/i;push@_,$_}print for sort{length$a<=>le +ngth$b}@_
    Print the word:
    open$:,"</usr/share/dict/words";while(<$:>){next unless/(?=.*a)(?=.*b) +(?=.*c)(?=.*d)(?=.*e)(?=.*f)/i;push@_,$_}for(sort{length$a<=>length$b +}@_){print;last}
    Compare length of Perl to Junk:
    #!/usr/bin/perl -lw use strict; my $PERL = <<'PERL'; open$:,"</usr/share/dict/words";while(<$:>){next unless/(?=.*a)(?=.*b) +(?=.*c)(?= .*d)(?=.*e)(?=.*f)/i;push@_,$_}@_=sort{length$a<=>length$b}@_' PERL { no strict; $JUNK = <<JUNK; sorted([w.strip() for w in open('/usr/share/dict/words', 'r').readline +s() if set(list('abcdef')).issubset(set(list(w.strip())))], key=lambda x: +len(x)) JUNK print length $PERL, ' PERL: ', $PERL; print length $JUNK, ' JUNK: ', $JUNK; } # I think junk could lose 4 spaces making it 148. __END__ PERL: 144 JUNK: 152
    Of course different versions of dict give different results...

Avoiding perl's Atof when assigning floating point values
5 direct replies — Read more / Contribute
by syphilis
on Jun 28, 2018 at 21:35

    Most perls assign floating point values using perl's internal Atof function - and that includes perls that define "Perl_strtod".
    But Perl's Atof function is notoriously incorrect, and a far better alternative IMO is to have floats assigned using Perl_strtod, which is just a wrapper around C's strtod() or strtold() or strtoflt128() - whichever is appropriate for the particular perl's nvtype.

    First up, I should point out that -Dusequadmath builds (ie builds for which $Config{nvtype} reports "__float128") already use Perl_strtod(), with the result that the __float128 values are assigned correctly, in my experience on Ubuntu-16.04. (By "correctly", I mean rounded to nearest, ties to even.)

    But when perl's nvtype is "double" or "long double", then values are being assigned using perl's Atof function and there's a fair chance that values are being assigned incorrectly.
    The magnitude of Atof's inaccuracies is not particularly large - mostly it's only 1 unit of least precision (ULP). But it can be as large as 7 ULP when nvtype is "double" and and as large as 54 ULP when nvtype is the extended precision "long double".
    (The figures of "7" and "54" are the largest I've found, having tested millions of random values - and those 2 numbers turn up often enough.)
    The actual likelihood of striking inaccuracies with Atof depends upon the exponent range that you're working in. If the exponent is in the range (say) -10 to 10 the likelihood of an incorrect assignment is about 10%.
    But when I randomly select values across the full exponent range, I'm finding that the chances of an incorrect assignment rise to around 97% for "doubles" and 82% for "long doubles".
    When I hack the perl source to use Perl_strtod, the chances of an incorrect assignment become 0. (Ok ... I haven't checked every value ... but I've not yet found a value that has been incorrectly assigned by Perl_strtod on Ubuntu.)

    It turns out that using Perl_strtod instead of perl's Atof is very easy to implement. We just need to open up numeric.c in the top level perl source folder, replace (the one occurrence of) "strtoflt128" with "Perl_strtod", replace every occurrence of "USE_QUADMATH" with "Perl_strtod", and rebuild perl.
    The actual patch (for perl-5.28.0 source) can be downloaded from my scratchpad.

    UPDATE: Better to grab this patch because:
    a) it's a portable patch for both mingw-w64 built Windows perl && Linux perl;
    b) at some time I'll probably clear my scratchpad.

    That's about it. If your perl's nvtype is "__float128" or your build of perl doesn't define "Perl_strtod", then applying the patch will not change anything.
    Otherwise, however, if you build perl using the patched numeric.c then perl will assign floating point values using Perl_strtod instead of perl's Atof.

    It's very much the same story on MS Windows wrt to mingw-w64 builds of perl whose nvtype is "double", where exactly the same patch makes equally dramatic improvements to the assigning of floating point values.
    Sadly, however, for "long doubles" on Windows, there's and that complicate matters.
    And there's also an issue wrt to strtold's assigning of some subnormal long double values - for which I've yet to submit a bug report.
    (More about Windows at a later date.)

    Here's the script I use to check $ARGV[1] randomly selected values within a specified exponent range (-$ARGV[0] to +$ARGV[0]).
    # # Test a range of values for # correctness of assignment use strict; use warnings; use Math::MPFR qw(:mpfr); die "Upgrade to Math-MPFR-4.03" unless $Math::MPFR::VERSION >= 4.03; die "Usage: perl maximum_exponent how_many_values" unless @ARGV == 2; $|++; my $display = 0; while($display !~ /^y/i && $display !~ /^n/i) { print("Do you want mismatched values to be displayed ? [y|n]: \n"); $display = <STDIN>; } $display = 0 if $display =~ /n/i; my($mant, $exp, $perl_unpacked, $mpfr_unpacked, $str_value); my($count, $diff, $max_diff, $min_diff) = (0, 0, 0, 0); my $max_exp = $ARGV[0]; $max_exp++; # $workspace is the Math::MPFR object to which # the value being tested is assigned. # Here we set the precision of $workspace to the # same number of bits as perl's NV. my $workspace = Rmpfr_init2($Math::MPFR::BITS); my $failed = 0; my($perl_nv, $mpfr_nv); for(;;) { $count++; $mant = int(rand(10)) . '.' . int(rand(10)) . int(rand(10)) . int(rand(10)) . int(rand(10)) . int(rand(10)) . int(rand(10)) . int(rand(10)) . int(rand(10)) ; $exp = int(rand($max_exp)); $exp = "-$exp" if $count % 2; $str_value = $mant . "e$exp"; # Assign $str_value to $mpfr_nv using mpfr $mpfr_nv = atonv($workspace, $str_value); # Assign $str_value to $perl_nv using perl $perl_nv = "$str_value" + 0; # $mpfr_nv and $perl_nv should be exactly equivalent. # Else atleast one of mpfr and perl has assigned incorrectly. # IME, mpfr does not assign incorrectly. unless($perl_nv == $mpfr_nv) { $failed++; $perl_unpacked = scalar reverse unpack "h*", pack "F<", $perl_nv; $mpfr_unpacked = scalar reverse unpack "h*", pack "F<", $mpfr_nv; print "$str_value: $mpfr_nv:\n $perl_unpacked vs $mpfr_unpacked\n\ +n" if $display; $diff = hex(substr($perl_unpacked, -8, 8)) - hex(substr($mpfr_unpa +cked, -8, 8)); if($diff > $max_diff) { $max_diff = $diff; } elsif($diff < $min_diff) { $min_diff = $diff; } } last if $count == $ARGV[1]; } print "Count: $count\n"; print "Failed: $failed\n"; print "Largest differences were $max_diff ULPs and $min_diff ULPs\n"; print "Failed: $failed\n"; print "Largest differences were $max_diff ULPs and $min_diff ULPs\n";
    It requires Math-MPFR-4.03. If you want to test values in the subnormal range, you should build Math::MPFR against mpfr-4.0.x as earlier versions of mpfr were buggy in their calculation of subnormals.
    As a starter, run perl 300 100, opting to display mismatches, and see how that fares.
    Whenever I run that command against a patched perl-5.28.0, 0 mismatches are detected, irrespective of perl's nvtype.
    Whenever I run that command against an unpatched perl-5.28.0, about 80 failures are detected unless, of course, nvtype is "__float128" - in which case no failures still occur.

    There's probably not many who would bother, but I certainly intend to continue building perl with this hack in place.

    UPDATE: For the record, gcc version on my Ubuntu box is 5.4.0, and libc version is 2.23

Why did you become a Perl expert (or programmer)?
12 direct replies — Read more / Contribute
by QM
on Jun 25, 2018 at 05:23
    Prompted by this comic at Commit Strip.

    Me? (Not that I'm an expert.) Because Perl was handy, extremely useful, and didn't require a separate compile phase. Because I could solve other people's problems with it. Because it was general purpose, and not specifically geared for stream picking / editing. Because it was free. Because it was more fun than any other language I knew at the time.

    Quantum Mechanics: The dreams stuff is made of

The CPAN Apocalypse: June 25, 2018
4 direct replies — Read more / Contribute
by usemodperl
on Jun 19, 2018 at 16:06
    I wonder what will happen the day goes away, and the day after. One more week and we will know! Does Perlmonks get flooded with identical questions? Does FUD gleefully announcing the death of Perl and CPAN make the rounds?? Do those redirects work???

Review of CGI::Alternatives
11 direct replies — Read more / Contribute
by Anonymous Monk
on Jun 09, 2018 at 11:44
    CGI::Alternatives is a module that "doesn't do anything"1 except vehemently deny and propogandize against the utility of one of the most useful forms of programming EVER conceived: CGI2 programming.

    CGI::Alternatives perpetuates all the common fallicies against CGI that if heeded only disempower independent developers. This includes advocating the replacement of all of Perl's wonderful and extremely simple, stable, mature, powerful CGI modules with vastly more byzantine "frameworks"1 which rigidly enforce all sorts of corporate nonsense like "full separation of concerns"1, total object-oriented lack of any possible function-ality, and the absurd complication of allowing oneself to be used by something as annoyingly totalitarian as templates1 EVEN when they're not appropriate! All of these techniques have their place of course, mostly in big projects, with lots of tiny modules (to confuse management, ensure job stability, in competitive workplaces, stretching hours into months, for the children), but not usually in code written by us individuals for fun, prototyping, and extreme levels of pure: results.3

    With all due respect to the author's efforts to change how we Perl into something his bosses find acceptable, the author of CGI::Alternatives is actually in charge of! How is this even possible? I realize the author is a talented programmer who has contributed significantly to CPAN, but this quote directly reflects his inappropriate state of mind towards the CGI paradigm (while is derided with that weird novelty-obsessed bigotry for being "old", as he removes perfectly sensible functions, only to prove his pointless point):

    "You can't just hand a template to the web-designers and allow them to work their magic. Don't mix the business logic and the presentation layer. Just don't."

    This guy doesn't even know what a CGI programmer does yet he dictates to us? This is CRAZY! We ARE the web-designers, OF the business logic, AND the presentation layer--ALL mixed together--like a SWISS ARMY CHAINSAW: this is our TECHINQUE! THIS IS Perl! Something YOU (Lee) obviously don't understand. Mixing it all up is exactly how some other language(s) seized the web from Perl (along with plenty of well-funded corporate FUD). Even though we still do it far better.

    We have been here from the beginning and we remain no matter how many of our tools you try to disable or how much FUD you spread about our primordially awesome technique of producing ONE UNIFIED FILE, USING CORE MODULES, QUITE OFTEN VASTLY SUPERIOR TO FRAGMENTED TEMPLATES, WRITTEN BY ONE PERL GENIUS, RATHER THAN A TEAM OF HOPELESSLY ABSTRACTED CORPORATE DRONES: Because Larry wrote and maintains Perl that way; Blessed be.

    How dare you tell us to stop doing what we love and what Perl empowers us to do? How dare you remove the HTML generation functions from Who do you think you are anyway? People who come to Perl and say things have got to change don't appreciate Perl and should be led as far away from Perl as possible (Python), not in charge of (formerly, unfortunately) core modules!

    Can someone who cares please take away from Lee Johnson (LEEJO)? I would feel far more comfortable with someone we can trust, like ikegami or Merlyn3, in charge of maintaining At least we know they would give us what we want and need, and more, rather than inflicting torture by removing legacy functionality FOR EMOTIONAL REASONS thereby violating operational stability.

    News for you Lee: What worked 20 years ago still works today: UNIX, POSIX, BASH, PERL, ME, AND MAYBE EVEN YOU. Mature technology never stops working! I appreciate innovation so don't necessarily stop trying to reinvent the wheel, but please do stop trying to shove your shiny new wheels in sheep's clothing down our throats because PERL ALREADY WON.4

    If we stopped wasting time and spirit listening to ideologically driven flame warrior infiltrators who keep trying to change Perl we would already have a perfectly backward compatible and "fixed" (even though it has never failed me to this very day thank you Lincoln Stein5) on the corelist joined by other bits we desperately need and use EVERY SINGLE DAY like CGI::Carp, Data::Dumper and File::Slurp.

    Some examples:

    This extremely useful one-line CGI dubugger is now broken thanks to LEEJO (thanks!):

    print header('text/plain'), Dumper $data; exit;

    This is never ever going away:

    start_html now reduces efficiency by 100%!:

    '</body></html>' end_html # removed from, for ideological anti-reasons

    If you agree PLEASE respond! If you do NOT agree please DO NOT hijack the thread because you guys already kinda won and I hope this thread can be for CGI programmers to chime in and support this seemingly lost cause which is really not even close to lost in the real non-ideological world of actual programmers who GET STUFF DONE.


    1. CGI::Alternatives
How will Artificial Intelligence change the way we code?
5 direct replies — Read more / Contribute
by LanX
on Jun 09, 2018 at 07:55
To <=80 char code line length or not
11 direct replies — Read more / Contribute
by stevieb
on Jun 07, 2018 at 18:15

    In all of my current 44 CPAN distributions, along with my near 80 Github Open Source repositories (Perl, C, C++, C#, Python etc), I (with few exceptions) enforce an 80 char limit on the length of the lines of code.

    I also do this even in my POD, Changes and test files (again, with some exceptions).

    I know that this practice is based on legacy console line-length reasons, but I still like to stick with it, as it keeps things very consistent, as well as allows my IDE to display the project layout, two open files side-by-side, and the overview (structure) of the file I'm currently working on to be viewed clearly and easily.

    Even when I'm using just vi/vim on the CLI outside of my IDE of choice, I can count on my code being consistently wide in all aspects.

    What are your thoughts here? Many coders I speak to go as far as 120 chars and they say that is helpful, and at $work (Python and C++), there's a 79 char limit and many hate it. Seems as though newer generations prefer longer line lengths, but here I am curious as to what the Perl community feels.

What is a Bool?
No replies — Read more | Post response
by tobyink
on Jun 07, 2018 at 05:01

    Already posted on

    Perl allows pretty much any value to be evaluated in a boolean context:

    if ($something) { ... }

    No matter what $something is, it will safely evaluate to either true or false. (With the exceptions of a few edge cases like blessed objects which are overloaded to throw an error when evaluated as booleans.)

    So when a Moose class does something like this, what does it mean?

    has something => ( is => 'ro', isa => 'Bool', );

    If absolutely any value could work when $self->something was accessed in boolean context, then what need is there to check what value is passed to the constructor? Should Bool basically be the same as Any, just spelled differently for documentation purposes?

    So what does Moose do? The documentation says:

    Bool accepts 1 for true, and undef, 0, or the empty string as false.

    However, that's not the full story. Blessed objects which overload stringification are accepted, but only if the stringification returns the strings "0", "1", or the empty string at the time the type constraint is checked. If the object stringifies to something else, but also overloads boolification sensibly, then too bad. Of course when you write if ($self->something) it's the boolification overloading which matters, but Moose only checks the stringification overloading.

    Moose's support for objects that overload stringification as booleans is not explicitly documented, nor is it covered at all by the Moose test suite.

    What does Mouse do? Well, that's even weirder. It mostly follows Moose's documented behaviour. It accepts "1" for true, and "0", undef, and the empty string for false. But also, it accepts objects overloading boolification for false. Yes, that's right ó if you overload boolification to return true, it will fail the type check. Overload it to return false, and you're golden!

    So where does this leave my module Types::Standard? Well, the pure Perl implementation follows what Moose does, and the (optional) XS implementation is forked from Mouse.

    For the latest release of the XS version, I've dropped support for objects which overload boolification to return false, bringing it in line with Moose's documented behaviour. I plan for the pure Perl implementation to also follow suit, dropping support for objects which overload stringification to return a boolean value.

    If you need support for objects overloading boolification, a quick workaround is this:

    has something => ( is => 'ro', isa => 'Any', # Bool );

    Or use coercions (example uses Types::Standard):

    has something => ( is => 'ro', isa => Bool->plus_coercions(Any, q{ !!$_ }), coerce => 1, );

    In the case of read-only attributes, I happen to believe accepting a blessed object as a boolean value could be harmful. The contents of the object could later change, changing the value from true to false, or vice versa, despite its read-onlyness.

RFC: LWP::UserAgent hit counter
2 direct replies — Read more / Contribute
by bliako
on Jun 03, 2018 at 10:30

    As I get sucked deeper and deeper into web scrapers -- the Cosmo Cramers of our era -- and constantly doing so with my faithful companion, the LWP::UserAgent, the need arose, primarily out of curtesy to the hosts, for counting the number of requests (hits) I made over a certain time interval and holding the scraper back by sleep()ing some time.

    Eventually, I decided I wanted to be able to know the ratio of active hitting sessions over sleep times and also control and tweak the hit rate and the subsequent burden on the host, for particular traffic situations: late night or noons, with just a few parameters, mainly the sleep() durations between the various phases of scraping and form filling. The latters, one could imagine being like a complex state machine which can lead you to deterministic -- most of the time -- but highly complex paths.

    And so I have devised two methods/tools to assist me in my endeavours, one is a hit counter for LWP::UserAgent and the other is a counter of sleep() seconds which works across all sleep() calls even in far and foreign modules.

    I will proceed now to lay out a module-based implementation of so-called UserAgent-with-Stats, including a test script.

    The basic idea is to subclass LWP::UserAgent in order to add a handler (via set_handler), when requested by the user, to the "request_send" phase of LWP's request(). The purpose of this handler is to increment our internal hit counter every time a request is sent by LWP (GET/POST/etc.).

    Additionally, there are two time counters to assist us in calculating the time-interval between when counter was turned on and either last-hit or when it was turned off. The aim is to be able to know the number of hits that occured within a time interval. Thinking about it maybe it makes more sense a time-first-hit to time-last-hit interval.

    Now, one may ask why there is a need to subclass and not create a new class which takes a LWP::UserAgent object in adds handler to it and keeps the counters. Indeed, that is another possibility.

    In any event, that's the basic idea. I would like to ask for your comments, corrections and recommendations. I will do the same for the sleep-count module in my next post.

    And here is a test script:

    I will detail the sleep-count in my next post.

    Thanks, bliako

Mocking LDAP in your tests
1 direct reply — Read more / Contribute
by Ea
on May 31, 2018 at 10:48
    Your mother was X.500 and your father smells of RFCs! Now go away or I shall mock you a second time!
      - from the original draft of Monty Python and the Holy Grail.

    I finally buckled down and starting to mock the LDAP server in my tests rather than trying to connect to a live server. Other than the documentation, there's not a lot of examples out there for Test::Net::LDAP::Mock or Test::Net::LDAP::Util, so here's the results from banging away at it for an afternoon along with what I think is going on. Please feel free to point out what I've done wrong. I've stopped where it started working for me.

    I have a Mojolcious app that authenticates against LDAP, but the tests would fail when using dummy accounts or when I wasn't connected. Here's the test I wrote

    Steps to mocking

    Setup your test environment as usual and use Test::Net::LDAP::Util qw/ldap_mockify/; The ldap_mockify method intercepts all calls to Net::LDAP->new() and redirects them to your mocked LDAP directory.
    1. Create a new Net::LDAP object
    2. Use the object to populate your mocked server with data using the add method
    3. If you want to mock the authentication process, use the mock_bind method with a call back that returns LDAP_SUCCESS or LDAP_INVALID_CREDENTIALS
    4. Now that your LDAP server is all mocked up, run your tests
    5. Don't forget the }; at the end of the method. It's a funny error message when you forget the semicolon at the end.


    • the $basedn that you mock has to be the same as the base DN that you search in your application. this is easier if you keep the values in a config file and read the same file in your test (not shown here for brevity)
    • testing authentication, you don't set the password for an entry with mock_password(), but instead supply mock_bind() with a callback
    • if you haven't imported Net::LDAP::Constant, you'll need to use the fully qualified name to report success Net::LDAP::Constant::LDAP_SUCCESS or failure Net::LDAP::Constant::LDAP_INVALID_CREDENTIALS
    • most of the methods in Test::Net::LDAP::Util seem to want to return success, regardless of the underlying data, which can be frustrating until you work that out and code accordingly.

    Well, what do you think? Does it get the job done?

    Edit - while cleaning up tabs used for putting this post together, I found a relevant question on StackOverflow from 5 years ago, but it hasn't been answered so far.


    Sometimes I can think of 6 impossible LDAP attributes before breakfast.

    YAPC::Europe::2018 — Hmmm, need to talk to work about sending me there or to Mojoconf.

GDPR ( Global Data Protection Rights )
6 direct replies — Read more / Contribute
by trippledubs
on May 17, 2018 at 01:33


    What do you think of General_Data_Protection_Regulation? I'm interested to know if your companies are behind it or minimally complying, more interested to know if you think individuals ought to have the rights expressed in that law and if there is really a moral obligation on site owners to comply. Or, if it should be scrapped or changed.

    The right of erasure specifically contradicts PM policy which is defended with the same argument that Wikipedia uses, the "Memory hole" argument. If one user decides to revoke the site owners permission to use their nodes, that creates a hole in the link of the chain, and every user is negatively affected. That is a pretty utilitarian view point. It smells slightly self serving to me to hear that argument from sites whose success directly rides on user generated content.

    It really only benefits future users, because if you were there, you don't need a tattoo of the conversation to remember it later. I don't see that a site owner, especially if it's not the hoster ie back in time machines, gets a perpetual license after you leave. Recipe sites -- let's say you participate for years honing the craft and eventually decide to write a cookbook, you don't ever have the right to revoke your recipes down off the boards and make the world pay for your stuff? But your dishes have probably benefited from all that recipe sharing, so it seems you would owe something too.

    I can't help but think of the social contract put forth in Crito. You have a good idea of what you are getting into when you participate online, seems reasonable that the site architects who built your playground would be able to dictate the terms, but I don't see how they have the right to continue to do so once you leave.

    I googled: Social contract, copyright law, landlord tenant, looked up about 10 web sites that were closing down or blocking EU Customer, but I can't make up my mind. There seems to be a lot of data players operating in the shadows without consent that should be addressed, but I can't see how it affects my life at all. I see an ad about something I almost bought on Amazon, big deal.

    Well surely we do not live in a perfect world, but does the GDPR move the decimal point either direction? Or just adding more compliance factories to the world? And who are the people who wrote the bill that made me get all this TOS spam. I tried to find the authors' names and I could not. Maybe this is a stepping stone to better "digital rights"?

RFC: Is the Bible encoded in DNA?
12 direct replies — Read more / Contribute
by wstryder
on May 14, 2018 at 07:31

    I have for a time entertained the idea, that if God is the creator, he would have left his signature in the DNA of human species. If I was the creator, I would have encoded the entire Hebrew Bible in DNA, so to let no one doubt that DNA was created by God and that the Bible is the word of God.

    I finally took up the challenge and wrote a perl script to check if the first five verses of Bible are encoded in DNA. Naturally there is an infinite number of ways to encode information in DNA, but I assumed that God would have used something quite obvious in order for us to be able to find information encoded in DNA. Iím assuming that if the Bible is encoded in DNA, the encoding used would be the same as for protein synthesis, namely that triplets of DNA base pairs would encode for one character. There are 64 possible codons so there is plenty of redundancy when they are used for encoding 22 hebrew alphabets (plus sofit forms for five characters).

    Like so:

    AAA -> Y AAC -> XXX AAG -> B AAT -> XXX ACA -> A ACC -> M ACG -> XXX ACT -> XXX AGA -> R AGC -> R AGG -> W AGT -> W ATA -> H ATC -> XXX ATG -> A ATT -> H CAA -> XXX CAC -> XXX CAG -> I CAT -> XXX CCA -> XXX CCC -> XXX CCG -> XXX CCT -> V CGA -> XXX CGC -> XXX CGG -> O CGT -> XXX CTA -> XXX CTC -> E CTG -> V CTT -> H GAA -> H GAC -> I GAG -> XXX GAT -> T GCA -> XXX GCC -> B GCG -> XXX GCT -> H GGA -> V GGC -> H GGG -> Y GGT -> A GTA -> Y GTC -> V GTG -> A GTT -> I TAA -> E TAC -> XXX TAG -> XXX TAT -> O TCA -> A TCC -> Y TCG -> XXX TCT -> XXX TGA -> R TGC -> H TGG -> A TGT -> XXX TTA -> XXX TTC -> L TTG -> B TTT -> A

    My dirty little perl script reads a FASTA file one character at a time and when a triplet is read, it check to see if that codon is already defined. If it is not, the first character of the target sequence is added to a hash containing all the codons. The algorithm then moves to the next triplet in DNA and check to see if that triplet is defined and so on. When a triplet is already defined and the character stored does not equal the target sequence, the script records the maximum length of the sequence found and goes back to the beginning of DNA and moves forward one base pair to continue the search.

    Iím not a computer science expert and Iím sure that my script is dirty and messy, but it does work. It takes 33h to search one target sequence against the 3 billion base pairs of human DNA. The FASTA files are in chunks of roughly 150 million base pairs, so several files need to be checked by hand, but this is not much of a problem. My computer crashes when I try to load more than 10million base pairs at a time, so the script reads each FASTA file in chunks of 5 million base pairs at a time.

    I could not get hebrew characters to work properly, so I simply translitterated the first five chapters of Genesis to ASCII characters. This is a dirty way of going about it, but it works.


    For control sequences I used Lorem ipsum, War and Peace and a random string. For the control sequences I checked the first one million base pairs only.

    The results so far:

    Lorem ipsum 42 characters found (250 million searched) War and peace 35 characters found (one million searched) Random string 35 characters found (one million searched)

    Having checked the hebrew Bible against so far 500 million base pairs, the maximum sequence found was 45 characters. This is more than the control sequences, but only because much more base pairs were compared. To be sure that the sequence was encoded in DNA by God, I would expect to find a sequence of hundreds of characters, preferably all the first five verses of Genesis. Iím not a mathematician, so I have not calculated what the maximum sequence length would be if left to chance alone. But the control sequences do give some estimate.

    Iím of course assuming that God used the hebrew Bible, because some say hebrew is the holy language, but Iíve also checked the King James English for the first verses of Matthew and John. If God is omnipotent, surely he could have encoded the Bible in DNA in any language. In the future Iíll check if New Testament passeges are encoded in Greek, but thus far Iím working with the assumption that the most awesome thing for God to do would have been to encode the biginning of Genesis. Will post results when I find anything.

    Let me know what you think of my efforts, I know this is nuts.

    My code can be downloaded at

PDL QuickRef
1 direct reply — Read more / Contribute
by mxb
on May 14, 2018 at 05:41

    Edit: Just noticed that PDL 2.019 has been released, this was written against 2.018. There shouldn't be many (if any) changes, but I'll update this comment accordingly once I've checked it over.

    Edit2: longlong range fixed.

    As there was significant interest in the porting of numpy to PDL documentation, I've been continuing to document my explorations with PDL.

    The following document is my own personal PDL 'QuickRef', which I've created as both a reference to myself and as a summary of PDL

    I've tidied it up and now I'm posting it here for others, should they find it useful. Hopefully, it's both useful to new users of PDL (exploring along with perldl shell) and as a reference for experienced users.

    Hopefully I've put this in the correct place, but mods feel free to move it if this is the wrong section.

    I will continue to update the 100 PDL Exercises offline and will post an updated version incorporating all feedback soon.

    PDL QuickRef

    Arguably, this is just a rehashing of the existing documentation available via the modules in the PDL::* namespace. However, I found it useful when learning PDL to have everything in a single place.

    PDL Creation

    Creation of Vectors

    The pdl function creates piddles from implicit and explicit scalars and variables. It accepts an optional first argument, $type, which specifies the internal data type of the piddle.

    PDL Datatypes

    All piddles store matrices of data in the same data type. PDL supports the following datatypes:

    Datatype Internal 'C' type Valid values
    byte unsigned char Integer values from 0 to +255
    short short Integer values from -32,768 to +32,767
    ushort unsigned short Integer values from 0 to +65,535
    long int Integer values from -2,147,483,648 to +2,147,483,647
    longlong long Integer values from Ė9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
    float float Real values from -1.2E-38 to +3.4E+38 with 6 decimal places of precision
    double double Real values from 2.3E-308 to +1.7E+308 with 15 decimal places of precision

    pdl Examples

    Row vector from explicit values: $v = pdl($type, [1,2]);
    Column vector from explicit values: $v = pdl($type, [[1],[2]]); or $v = pdl($type, [1,2])->(*1);
    Row vector from scalar string: $v = pdl($type, "1 2 3 4");
    Row vector from array of numbers: $v = pdl($type, @a);
    Matrix from explicit values: $M = pdl($type, [[1,2],[3,4]]);
    Matrix from a scalar: $M = pdl($type, "[1 2] [3 4]");

    Piddle Helper Creation Functions

    In the following functions, where arguments are marked as ..., accept arguments in the following form:

    • $type - an optional data type (see above)
    • $x,$y,$z,... - A list of n dimensions for the resulting piddle, OR
    • $M - Another piddle, from which the dimensions will be re-used
    Sequential integers, starting at zero: $M = sequence(...);
    Sequential Fibonacci values, starting at one: $M = fibonacci(...);
    Of all zeros: $M = zeros(...);
    Of all ones: $M = ones(...);
    Of random values between zero and one: $M = random(...);
    Of Gaussian random values between zero and one: $M = grandom(...);
    Where each value is it's zero-based index along the first dimension: $M = xvals(...);
    Where each value is it's zero-based index along the second dimension: $M = yvals(...);
    Where each value is it's zero-based index along the third dimension: $M = zvals(...);
    Where each value is it's zero-based index along dimension $d: $M = axisvals(..., $d);
    Where each value is it's distance from a specified centre: $M = rvals(..., {Centre=>[x,y,z,...]);

    The following functions create piddles with dimensions taken from another piddle, $M and distribute values between two endpoints ($min and $max) inclusively:

    Linearly distributed values along the first dimension: $N = $M->xlinvals($min, $max);
    Linearly distributed values along the second dimension: $N = $M->ylinvals($min, $max);
    Linearly distributed values along the third dimension: $N = $M->zlinvals($min, $max);
    Logarithmically distributed values along the first dimension: $N = $M->xlogvals($min, $max);
    Logarithmically distributed values along the second dimension: $N = $M->ylogvals($min, $max);
    Logarithmically distributed values along the third dimension: $N = $M->zlogvals($min, $max);

    Co-ordinate Piddles

    Finally the ndcoords utility function creates a piddle of co-ordinates for the supplied arguments. It may be called in two ways:

    • $coords = ndcoords($M); - Take dimensions from another piddle
    • $coords = ndcoords(@dims); - Take dimensions from a Perl list

    Piddle Conversion

    A piddle can be converted into a different type using the datatype names as a method upon the piddle. This returns the converted piddle as a new piddle. The inplace method does not work with these conversion methods.

    Operation Operator
    Convert to byte datatype: $M->byte; or byte $M;
    Convert to short datatype: $M->short; or short $M;
    Convert to ushort datatype: $M->ushort; or ushort $M;
    Convert to long datatype: $M->long; or long $M;
    Convert to longlong datatype: $M->longlong; or longlong $M;
    Convert to float datatype: $M->float; or float $M;
    Convert to double datatype: $M->double; or double $M;

    Obtaining Piddle Information

    PDL provides a number of functions to obtain information about piddles:

    Description Code
    Return the number of elements: $M->nelem;
    Return the number of dimensions: $M->ndims;
    Return the length of dimension $d: $M->dim($d);
    Return the length of all dimensions as a Perl list: $M->dims;
    Return the length of all dimensions as a piddle: $M->shape;
    Return the datatype of a piddle: $M->type;
    Return general information about a piddle (datatype, dimensions): $M->info;
    Return the memory used by a piddle: $M->info("%M");

    Indexing, Slicing and Views

    Points To Note

    PDL internally stores matrices in column major format. This affects the indexing of piddle elements.

    For example, take the following matrix $M:

    [ [0 1 2] [3 4 5] [6 7 8] ]

    In standard mathematical notation, the element at Mi,j will be i elements down and j elements across, with the elements 0 and 3 at M1,1 and M2,1 respectively.

    With PDL indexing, indexes start at zero, and the first two dimensions are 'swapped'. Therefore, the elements 0 and 3 are at PDL indices (0,0) and (0,1) respectively.

    Views are References

    PDL attempts to do as little work as possible in that it will try to avoid memory copying of piddle values when it can. The most common operations where this is the case is when taking piddle slices or views across a piddle matrix. The piddles returned by these functions are views upon the original data, rather than copies, so modifications to them will affect the original matrix.


    A common operation is to view only a subset of a piddle. This is called slicing.

    As slicing is such a common operation, there is a module to implement a shorter syntax for the slice method. This module is PDL::NiceSlice. This document only uses this syntax.

    A rectangular slice of a piddle is returned via using the default method on a piddle. This takes up to n arguments, where n is the number of dimensions in the piddle.

    Each argument must be one of the following forms:

    "" An empty value returns the entire dimension.
    n Return the value at index n into the dimension, keeping the dimension of size one.
    (n) Return the value at index n into the dimension, eliminating the entire dimension.
    n:m Return the range of values from index n to index m inclusive in the dimension. Negative indexes are indexed from the end of the dimension, where -1 is the last element.
    n:m:s Return the range of values from index n to index m with step s inclusive in the dimension. Negative indexes are indexed from the end of the dimension, where -1 is the last element.
    *n Insert a dummy dimension of size n.

    The following examples operate on the matrix $M:

    [ [0 1 2] [3 4 5] [6 7 8] ]
    Description Command Result
    Return the first column as a 1x3 matrix: $M->(0,); [ [0][3][6] ]
    Return the first row as a 3x1 matrix: $M->(,0); [ [0 1 2] ]
    Return the first row as a 3 element vector: $M->(,(0)); [0 1 2]
    Return the first and second column as a 2x3 matrix: $M->(0:1); [ [0 1] [3 4] [6 7] ]
    Return the first and third row as a 3x2 matrix: $M->(,0:-1:2); [ [0 1 2] [6 7 8] ]


    Occasionally it is required to extract non-contiguous regions along a dimension. This is called dicing. The dice method accepts an array of indices for each dimension, which do not have to be contiguous.

    The following examples operate on the matrix $M:

    [ [0 1 2] [3 4 5] [6 7 8] ]
    Description Command Result
    Return the first and third column as a 2x3 matrix: $M->dice([0,2]); [ [0 2] [3 5] [6 8] ]
    Return the first and third column and the first and third row as a 2x2 matrix: $M->dice([0,2],[0,2]); [ [0 2] [6 8] ]

    Which and Where Clauses

    The other common operation to perform over a piddle is to apply a boolean operation over the entire piddle elementwise. This is achieved in PDL with the where method.

    The where method accepts a single argument of a boolean operation. The element is referred to within this argument with the same variable name as the piddle. The values in the returned piddle are references to the values in the initial piddle.

    In a similar mannor to which clauses outlined above, there is the where method. The difference between these two methods is that which returns the values, while where returns the indices.

    This is best explained with examples over a matrix $M:

    Description Return values Return indices
    Obtain all positive values: $M->where($M > 0); which($M > 0);
    Obtain all values equal to three: $M->where($M == 3); which($M == 3);
    Obtain all values which are not zero: $M->where($M != 0); which($M != 0);

    Note that there is also the which_both function. This function returns an array of two piddles. The first is a list of indices for which the boolean operation was true, the second for which the result was false.

    Again, as where clauses as so common PDL::NiceSlice has syntatic support for it through the default method. This is acheived through an argument modifier, which is appended to the single argument.

    The modifiers are seperated from the original argument via a ; character, and the following modifiers are supported:

    Modifier Description
    ? The argument is no longer a slice, but rather a where clause
    _ flatten the piddle to one dimension prior to the operation
    - squeeze the piddle by flattening any dimensions of length one.
    | sever the returned piddle into a copy, rather than a reference

    Using this syntax, the following where commands are identical:

    $M-&gt;where($M &gt; 3); $M-&gt;($M &gt; 3;?);

    View Modification

    PDL contains many functions to modify the view of a piddle. These are outlined below:

    Description Code
    Transpose a matrix/vector: $M->transpose;
    Return the multidimensional diagonal over the supplied dimensions: $M->diagonal(@dims);
    Remove any dimensions of length one: $M->squeeze;
    Flatten to one dimension: $M->flat;
    Merge the first $n dimensions into one: $M->clump($n);
    Merge a list of dimensions into one: $M->clump(@dims);
    Exchange the position of zero-indexed dimensions $i and $j: $M->xchg($i, $j);
    Move the position of zero-indexed dimension $d to index $i: $M->mv($d, $i);
    Reorder the index of all dimensions: $M->reorder(@dims);
    Concatenate piddles of the same dimensions into a single piddle of rank n+1: cat($M, $N, ...);
    Split a single piddle into an array of piddles across the last dimension: ($M, $N, ...) = dog($P);
    Rotate elements with wrap across the first dimension: $M->rotate($n);
    Given a vector $v return a matrix, where each column is of length $len, with step $step over the entire vector: $M->lags($dim, $step, $len);
    Normalise a vector to unit length: $M->norm;
    Destructively reshape a matrix to n dimensions, where n is the number of arguments and each argument is the length of each dimension. Any additional values are discarded and any missing values are set to zero: $M->resize(@dims);
    Append piddle $N to piddle $M across the first dimension: $M->append($N);
    Append piddle $N to piddle $M across the dimension with index $dim: $M->glue($dim, $N);

    Matrix Multiplication

    PDL supports four main matrix multiplication methods between two piddles of compatible dimensions. These are:

    Operation Code
    Dot product: $M x $N;
    Inner product: $M->inner($N);
    Outer product: $M->outer($N);
    Cross product: $M->crossp($N);

    As the x operator is overloaded to be the dot product, it can also be used to multiply vectors, matrices and scalars.

    Operation Code
    Row x matrix = row $r x $M;
    Matrix x column = column $M x $c;
    Matrix x scalar = matrix $M x 3;
    Row x column = scalar $r x $c;
    Column x row = matrix $c x $r;

    Arithmetic Operations

    PDL supports a number of arithmetic operations, both elementwise, over an entire matrix and along the first dimension. Double precision variants are prefixed with d.

    Operation Elementwise Over entire PDL Over 1st Dimention
    Addition: $M + $N; $M->sum;; $M->dsum; $M->sumover;; $M->dsumover;
    Subtraction: $M - $N;
    Product: $M * $N; $M->prod;; $M->dprod; $M->prodover;; $M->dprodover;
    Division: $M / $N;
    Modulo: $M % $N;
    Raise to the power: $M ** $N;
    Cumulative Addition: $M->cumusumover;; $M->dcumusumover;
    Cumulative Product: $M->cumuprodover;; $M->dcumuprodover;

    Comparison Operations:

    PDL supports a number of different elementwise comparison functions between matrices of the same shape.

    Operation Elementwise
    Equal to: $M == $N;
    Not equal to: $M != $N;
    Greater than: $M > $N;
    Greater than or equal to: $M >= $N;
    Less than: $M < $N;
    Less than or equal to: $M <= $N;
    Compare (spaceship): $M <=> $N;

    Binary Operations

    PDL also allows binary operations to occur over piddles. PDL will convert any real number datatype piddles (float, double) to an integer before performing the operation.

    Operation Elementwise Over entire PDL Over 1st Dimention
    Binary and: $M & $N; $M->band; $M->bandover;
    Binary or: $M | $N; $M->bor; $M->borover;
    Binary xor: $M ^ $N;
    Binary not: ~ $M; or $M->bitnot;
    Bit shift left: $M << $N;
    Bit shift right: $M >> $N;
    Logical and: $M->and; $M->andover;
    Logical or: $M->or; $M->orover;
    Logical not: ! $M; or $M->not;

    Trigonometric Functions

    These PDL functions operate in units of radians elementwise over a piddle.

    Operation Elementwise
    Sine: $M->sin;
    Cosine: $M->cos;
    Tangent: $M->tan;
    Arcsine: $M->asin;
    Arccosine: $M->acos;
    Arctangent: $M->atan;
    Hyperbolic sine: $M->sinh;
    Hyperbolic cosine: $M->cosh;
    Hyperbolic tangent: $M->tanh;
    Hyperbolic arcsine: $M->asinh;
    Hyperbolic arccosine: $M->acosh;
    Hyperbolic arctangent: $M->atanh;

    Statistical Functions

    PDL contains many methods to obtain statistics from piddles. Double precision variants are prefixed with d.

    Operation Over entire PDL Over 1st Dimention
    Minimum value: $M->min; $M->minover;
    Maximum value: $M->max; $M->maxover;
    Minimum and maximum value: $M->minmax; $M->minmaxover;
    Minimum value (as indicies): $M->minover_ind;; $M->minover_n_ind;
    Maximum value (as indicies): $M->maxover_ind;; $M->maxover_n_ind;
    Mean: $M->avg;; $M->davg; $M->avgover;; $M->davgover;
    Median: $M->median;; $M->oddmedian; $M->medover;; $M->oddmedover;
    Mode: $M->mode; $M->modeover;
    Percentile: $M->pct;; $M->oddpct; $M->pctover;; $M->oddpctover;
    Elementwise error function: $M->erf;
    Elementwise complement of the error function: $M->erfc;
    Elemntwise inverse of the error function: $M->erfi;
    Calculate histogram of $data, with specified $minimum bin value, bin $step size and $count bins: histogram($data, $step, $min, $count);
    Calculate weighted histogram of $data with weights $weights, specified $minimum bin value, bin $step size and $count bins: whistogram($data, $weights, $step, $min, $count);
    Various statistics: $M->stats; $M->statsover;

    The 'various statistics' described above are returned as a Perl array of the following items:

    • mean
    • population RMS deviation from the mean
    • median
    • minimum
    • maximum
    • average absolute deviation
    • RMS deviation from the mean

    Zero Detection, Sorting, Unique Element Extraction

    Operation Over entire PDL Over 1st Dimention
    Any zero values: $M->zcheck; $M->zcover;
    Any non-zero values: $M->any;
    All non-zero values: $M->all;
    Sort (returning values): $M->qsort; $M->qsortvec;
    Sort (returning indices): $M->qsorti; $M->qsortveci;
    Unique elements: $M->uniq; $M->uniqvec;
    Unique elements (returning indices): $M->uniqind;

    Rounding and Clipping of Values

    PDL contains multiple methods to round and clip values. These all opererate elementwise over a piddle.

    Operation Elementwise
    Round down to the nearest integer: $M->floor;
    Round up to the nearest integer: $M->ceil;
    'Round half to even' to the nearest integer: $M->rint;
    Clamp values to a maximum of $max: $M->hclip($max);
    Clamp values to a minimum of $min: $M->lclip($min);
    Clamp values between a minimum and maximum: $M->clip($min, $max);

    Set Operations

    PDL contains methods to treat piddles as sets of values. Mathematically, a set cannot contain the same value twice, but if this happens to be the case with the piddles, PDL takes care of this for you.

    Operation Code
    Obtain a mask piddle for values from $N contained within $M: $M->in($N);
    Obtain the values of the intersection of the sets $M and $N: setops($M, 'AND', $N); or intersect($M, $N);
    Obtain the values of the union of the sets $M and $N: setops($M, 'OR', $N);
    Obtain the values which are in sets $M or $N, but not both (union - intersection): setops($M, 'XOR', $N);

    Kernel Convolusion

    PDL supports kernel convolution across multiple dimensions:

    Description Code
    1-dimensional convolution of matrix $M with kernel $K across first dimension (edges wrap around): $M->conv1d($K);
    1-dimensional convolution of matrix $M with kernel $K across first dimension (edges reflect): $M->conv1d($K, {Boundary => 'reflect');
    2-dimensional convolution of matrix $M with kernel $K (edges wrap around): $M->conv2d($K);
    2-dimensional convolution of matrix $M with kernel $K (edges reflect): $M->conv2d($K, {Boundary => 'reflect');
    2-dimensional convolution of matrix $M with kernel $K (edges truncate): $M->conv2d($K, {Boundary => 'truncate');
    2-dimensional convolution of matrix $M with kernel $K (edges repeat): $M->conv2d($K, {Boundary => 'replicate');

    Miscellaneous Mathematical Methods

    Here is all the other stuff which doesn't fit anywhere else:

    Description Code
    Elementwise square root: $M->sqrt;
    Elementwise absolute value: $M->abs;
    Elementwise natural exponential: $M->exp;
    Elementwise natural logarithm: $M->log;
    Elementwise base 10 logarithm: $M->log10;
    Elementwise raise to the power $i: ipow($M, $i);
Is it still worth learning Perl as a first language?
5 direct replies — Read more / Contribute
by tm2383
on May 03, 2018 at 20:45

    I'm curious to know if Perl Monks believe that Perl is still worth learning as a primary programming language. I've used it in the past for some bioinformatics programming and really like the language. I'm interested in a change of career and wonder if it is worth the time investment really learning Perl in depth with the aim of becoming a Perl developer some time in the future. I know that there are a lot of 'trendier' programming languages out there like Python, PHP and Ruby. My logic behind learning Perl is that there are fewer people learning it compared to other languages. I assume that the market is awash with programmers using these other languages and that there might be a niche for perl programmer. Does anyone here work as a professional Perl programmer, either as an employee or a freelancer? Is there a future in Perl programming, or is a lot of the work migrating Perl to another platform? Are any start ups still using Perl frameworks like Catalyst?

Add your Meditation
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (1)
    As of 2018-10-22 03:54 GMT
    Find Nodes?
      Voting Booth?
      When I need money for a bigger acquisition, I usually ...

      Results (119 votes). Check out past polls.