If you've discovered something amazing about Perl that you just need to share with everyone,
this is the right place.
This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)
Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").
It tests a distribution with different perl versions (+different compile options) on different platforms (+ different distros/versions).
It's great to detect:
1. incompatibility of code with certains perl versions
2. integration tests failure (when tests involve system calls that act differently on different platforms)
3. incompatibility with different versions of core modules
However it looks useless to test accuracy of CPAN dependencies versions (prereqs/build_requires in META).
For example, we test module XYZ. It requires module ABC with "version=0".
Cpan testers will install _latest_ version of ABC, if ABC is not installed, or use existing version if ABC is installed
as core module, or it's a dependency of CPAN Testers itself, or by chance was installed on the box.
I wrote script for myself which tests my code with all versions of all dependencies that I use (in turn, not all possible combinations).
Result looks useful, I found couple of old versions of modules that are not compatible with my code, couple of bug in 3rd party modules
prereqs specifications. Several 3rd party module versions that don't build due to a bug which was later fixed (that part is useless for me).
I also was checking BACKPAN versions. If module is on backpan it can still be installed on user machines (installed in the past),
or it can be distributed with OS package manager (without CPAN).
Now I am wondering is there similar feature in CPAN testers ?
Or any other existing service ?
Should such functionality be added to CPAN testers or it's just different from CPAN testers project goal?
Some of you may have recognized the scent of Perl6 in the title, but it's actually something I first thought of a while back, when I was still a complete Perl profane. It is, however, about regexes. And to avoid misunderstandings, I make a distinction between regular expression, the 'simple' pattern matching format, and regexes, the regular expression superset provided by Perl.
This meditation is about parenthesis (round brackets) being capturing by default, and (?:*insert non captured group here*) being quite the mouthful. This is mostly addressed by Perl 6's fifth apocalypse where non capturing grouping is revealed to be done with square brackets, but I still think it should be the opposite.
It's quite simple actually: parenthesis are the obvious way to do grouping, because that's what they do pretty much everywhere in programming languages, even in math actually (that may be be the other way around now that I think about it). Parenthesis change which operations you read together, and tokenize expressions, and that's pretty much it. Of course you wouldn't have to search very far to find another use for parenthesis, as a matter of fact I'm already talking about Perl.
Since parenthesis are the obvious way to do it, someone may, like I did when I first tried working with Perl, use them without checking that part of the documentation and thus not know that capturing groups have been made. This probably isn't a performance issue, because if you can't bother to read the documentation well enough to know about that, there probably are other things you fail to optimize. It's an issue when it can break code, the example I came across is split.
"If the PATTERN contains parentheses, additional array elements are created from each matching substring in the delimiter."
So if you don't know your Perl well and have something like
And there you end up with 'na's in your @batmen, what a pity!
On the other hand, if you don't know Perl much and read something like (Perl 5) /(?:bat|spider)man/ or (Perl 6) /super[tramp|man| time]/ you may think that something strange is happening, when you are just grouping.
This is an issue for people who know regular expression, that would either try to use patterns created with only that knowledge, and would use parenthesis because that's how it's suppose to be done, and might come up with something unexpected in a split or in some way I haven't thought about. My previous example still stands, those people wouldn't understand [ch?|b]ar or I wish I was the m(?:oo)+n when the unknown syntax doesn't mean that a Perl feature that doesn't exist in regular expressions has been used. This paragraph should actually have been my main point.
So I was wondering, is there was any reason for parenthesis to have a capturing feature in regexes except for the fact that this is how it has always been done.
And I'm afraid the ugly truth behind all this, is that I'm french, with an 'azerty' keyboard, where [ and ] are harder to type than ( and ), and I don't want the extra effort for not using a feature; because I'm lazy :P .(Edit : this is supposed to be taken as I joke. I do realize now that it's only obvious if you've used an AZERTY keyboard, and know that typing [ isn't any trouble)
Edit : I just found part of the answer on my own. Some other regular expression extensions use capturing parenthesis as well inside of the pattern, so that you can have \x tokens. I just forgot to get my head out from the base of regular expressions. I hope I didn't bore those who read all that too much ^^".
Recently, I've been thinking about a really, really minor perl issue: what's the best way to format your script's main routine? I'd also always wondered how you were supposed to unit-test the main routine in your script. I recently came up with an idea (inspired by a brian_d_foy article) that answers both questions for me.
When I first started coding, I just put everything in my main routine at top level-- in global scope. My scripts looked something like this (translated from perl 4):
#!/usr/bin/env perl
use strict;
use warnings;
# Script variables
our $Foo = 'bar';
# Main routine
my ($name, $greeted) = @ARGV;
$name //= 'Horace!';
$greeted //= 'world';
say_hello($greeted);
# Subroutines
sub say_hello {
my ($name) = @_;
print "Hello $name\n";
}
PDL is a very general datatype. To one person, a 2D piddle may represent a collection of samples from a song while to another person a 2D piddle may represent an image. However, when you say use PDL::Image2D, various methods get installed into the PDL package, for all piddles to use. This naturally leads to the careful selection of long-winded names in a defensive approach to not getting your toes stepped on (or not stepping on somebody else's toes).
It occurred to me a few days ago that we could manage this with lexically scoped methods. I don't mean lexically scoped subroutines (i.e. Lexical::Sub) because I really like PDL's method-chaining and want to keep that. Nor do I mean lexically scoped methods that get called on ALL objects, regardless of type (i.e. Method::Lexical) because I only want to (ultimately) modify the PDL method resolution.
Today a proposal came up on the Perl 5 Porters mailing list to have a new pragma that would replace the dereference and method call operator (->), with the dot (.). Since this is now used as the concatenation operator, for the purpose of concatenation the tilde (~) would be used. (It is now used as the one’s complement operator, and would still mean that in unary contexts, but between two terms would mean to concatenate.)
There are two pretty good reasons for this. First, the dot is easier to type than the ->, and since this is one of the most used operators in all of Perl, it would save a lot of typing (and saved characters, the necessity for wrapped lines, etc.). Second, many other common computer languages use the dot as the object method call, so this would ease learning of Perl by those familiar with other languages (and, I suppose, learning of other languages by people familiar with Perl).
I’m not sure this is the right thing to do, for a couple of reasons. One of the reasons is purely semantic. In Western written natural languages, the dot is normally used to end a sentence. That’s the opposite meaning of the dot here: it indicates that the part after the dot is a qualification or function dependent on the part before the dot. If any single character in ASCII indicates that, it would be the colon (:), a character which, on its own, is terribly underutilized in Perl, introducing the third part of the conditional operator (?:).
But the other reason is the joint use of tilde as binary concatenation operator and unary one’s-complement operator. These are two uses of the same character that have absolutely nothing to do with each other. The perl interpreter will, no doubt, have no trouble determining whether, in a particular spot, the use of the tilde is unary or binary. But will that be as easy for people to figure out?
I hesitate to go further, because I’m afraid it will mark me as un-perly and I’ll be exiled, forced to live in a world that uses significant whitespace. But I think the tendency of Perl to use lots of different ASCII punctuation in lots of different ways can be confusing, especially when white space isn’t always required between terms. Is that “&” the bitwise and operator or the sigil for a code reference? Does that dot mean concatenation, or is it the decimal point? Is the three-dot combination ... the flip-flop operator or the “yada yada yada” operator indicating that more is to be written later? Is that octothorpe (#, confusingly enough sometimes called the hash) introducing a comment or is it part of the $# array count symbol?
And, of course, punctuation variables add more ambiguity. I know that perl generally can figure this out, although the several entries in perldiag beginning “Ambiguous use of…” makes me wonder. But people often have more difficulty. (In spoken natural language, one can always inquire of the speaker to clarify ambiguity; not so here. Perhaps natural languages are different than computer languages? Can I say that?)
It’s too late now to go back and give Perl a set of clear, unambiguous operators.
(In my dreams I imagine a Perl where punctuation variables are replaced by variables beginning with control characters. Where terms made of punctuation would always be separated from other punctuation terms by whitespace [so +=- would be a trigraph, not an addition-assignment followed by a unary minus]. Where . is always the decimal point; where & and % are either sigils or operators but not both. Then I start thinking about whether @fred could be the same as @$fred, and &fred the same as &$fred, and if so whether we really need @fred and &fred at all, and then whether we need sigils at all, and then I wake up in a cold sweat.)
But, although perhaps my opinion is colored by a deep unperliness, I don’t think adding more ambiguity is the right direction.
Just to be clear, I don't actually program in much anything except Perl (and, shudder, Applescript)... so I don't think I can be accused of being corrupted by the outside. Naïve about other languages, certainly.
I'm looking for testers to test a quick script I threw together that I'm not sure how to improve upon anymore - it's a basic downloading tool I've created in the last two days. It's nothing fancy, it's useful to automate large amounts of downloads if you need a quick-and-dirty way to download tons of files without needing to babysit them, though.I built it out of necessity for my job, since I either needed to build a script that automated the download of over 2000 files, find one that works well for the task, or do it myself... I couldn't find one that worked for me, so I learned more about LWP and WWW::Mechanize, asked a couple questions from you great Monks, and have it working! It's currently being used to download said 2000+ files onto one of our computers, so I thought I'd ask for criticism on how to improve the meager tool, and release it into the wilderness of the internet.
It's available for viewing/download on my dropbox here, feel free to download it and tell me what you think. Are there glaring issues with it? Are there features you'd want added if you were going to use this tool? Be nice please, but feel free to critique.
Edit: Here's the code, upon advisement that I should post the code directly here for people to view.
If you are interested in the history and people involved in the development of early stored-program computers, "Turing's Cathedral" by George Dyson is a good read.
Although I am a generation behind these early developments, I saw my share of paper tape, punched cards, and hand coded and toggled in machine language. Did you ever drop the center out of a large roll of paper tape? The "good ole days".
James
There's never enough time to do it right, but always enough time to do it over...
Before I start this, this is not intended to be a flamewar or for me to be immodest.
Recently I did a technical test in order to apply for a job. The technical test is run by an organisation which produces such tests, and so conducts a large number of tests. I scored better than 95% of all candidates who took the test (ever, not just for this role).
My problem is that this is for a *cough*PHP*cough* role.
I am aware that whilst I consider myself to be reasonably adept at Perl, I am not at that level. Should I permanently forsake Perl for where my abilities seem to lie?
Yours philosophically, space_monk
If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Perlers tend to use the convention of using a leading underscore to mark private/protected methods in OO code. However, this has some problems:
It conflates private and protected methods. Private methods being those methods which nobody outside the class has any business calling; protected methods being those which are OK for subclasses to call and potentially override.
When you're writing a subclass, you need to know about all the superclasses' private methods to ensure you don't accidentally override any of them by coincidentally writing a sub with the same name.
Perl 5.18's experimental lexical subs feature can be exploited to create truly private subs. We need to introduce a little syntactic sugar (or perhaps vinegar, depending on how you look at it), to circumvent the fact that lexical subs are not really designed to be called as methods - we use this syntax: $self->${\\&method_name}(@args). (Think of it as a mega-sigil.)
#!/usr/bin/env perl
use v5.18;
use feature qw(lexical_subs);
no warnings qw(experimental);
package MyClass
{
#
# Private methods
#
my sub get_number
{
my $self = shift;
return $self->{number};
}
#
# Public methods
#
sub new
{
my $class = shift;
return bless {@_}, $class;
}
sub say_number
{
my $self = shift;
# Slightly funky syntax for calling a
# lexical sub as a private method...
say $self->${ \\&get_number };
}
}
package ChildClass
{
use base "MyClass";
#
# A public method that has the same name as a
# private method in the superclass. Note that
# the private method will *NOT* get overridden!
# Yay!
#
sub get_number
{
warn "You're not allowed to get the number!";
}
}
my $o = ChildClass->new(number => 42);
$o->say_number; # works...
my $p = MyClass->new(number => 666);
$p->get_number; # asplode; it's a private method
Update: I hasten to add that I'm not advocating doing this in production code any time in the near future. But it's an interesting option to think about in years to come.
I'll also point out that a similar thing can be achieved using coderefs in earlier versions of Perl:
my $private_method = sub {
...;
};
$self->$private_method(@args);
Frequently I find myself with a one-off data file that I need to analyze with a one-off script. It's nice to keep the script with the data, so I generally put the script in the data file, with a __DATA__ marker separating the two parts.
At this point, I have another problem. One-off scripts benefit tremendously by the use of Perl's command line flags that write code for you, the '-p' and '-n' flag in particular. Those two flags wrap your code in a while (<>) ... loop, which unfortunately reads all the files listed on the command line, or STDIN if there aren't any. My data, needless to say, is in the DATA file handle. I mentioned this in the chatterbox, and choroba had the answer:
BEGIN { *ARGV = *DATA unless @ARGV }
I especially like the unless clause, since it lets me override the data source. I can see using this for test cases, where I have a bunch of default test data but can easily test against other data files as well.
Why It Works
We're overwriting one typeglob (*ARGV) with another (*DATA). A typeglob contains Perl's internal representation of everything known about the given name, which includes any scalars, arrays, hashs, or filehandles. In this case, the ARGV set of variables have several "magical" properties, which are listed in perlvar:
$ARGV
Contains the name of the current file when reading from <>.
@ARGV
The array @ARGV contains the command-line arguments intended for the script. $#ARGV is generally the number of arguments minus one, because $ARGV[0] is the first argument, not the program's command name itself. See $0 for the command name.
ARGV
The special filehandle that iterates over command-line filenames in @ARGV . Usually written as the null filehandle in the angle operator <> . Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <> . In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in @ARGV.
The assignment *ARGV = *DATA will replace all of these with the only-slightly-less magical DATA values, which is cleverly not mentioned in perlvar, only in perldata. In this case, only the filehandle has any special properties. This means that the assignment also overwrites the $ARGV and @ARGV values with the undefined values of $DATA and @DATA, but I can't see many cases where you'd need those values once ARGV is gone. If I'm wrong, however, ambrus has pointed out that you could change the IO slot only, by *ARGV = *DATA{IO}
As a beginner or journeyman Perl programmer I've been able to do useful work in the language without ever using anything more advanced than an occaisional reference. What are (I guess) the most elementary parts of the language enable me to do many things that are too slow, too clumsy, or just outright impossible in any Unix shell.
I am finding that the book:
"Learning Perl Objects, References, & Modules"
by Randal L. Schwartz with Tom Phoenix
is immensely helpful to me in learning about the more advanced features of Perl. It introduces concepts in a natural (probably not the only possible) order, it almost completely avoids the maddening practice of using a term before defining it, it provides good examples and exercises that really help me absorb the lessons (especially if I actually do the exercises and type up, run, & play with the examples in the text), and it introduces some core modules and illustrates why you might want to use them.
No, I'm not trying to sell the book, there may be other ones just as good or better, and maybe there are better choices of 'first modules' to introduce to the journeyman Perl coder. But I have to say that so far (I'm about half through it), this is IMHO a very good '2.0' or even '3.0' level tutorial for someone trying to learn beyond the very basics of the language.
I recently had a discussion with a fellow perlmonger about a 3h video¹ about meta-programming in Python thanks to so called decorators. (something like advice in LISP)
So my reply was
Great, but what exactly are the benefits over attributes and type-glob manipulation in Perl?
His long reply can be summarized with something like
Attributes are very difficult to handle
"Nobody" understands them
no good tutorials
In Python decorators are first class objects with very short elegant code
use strict;
use warnings;
use Data::Dumper qw'Dumper';
use Attribute::Handlers;
use feature 'say';
sub wrap {
my ($glob,$c_wrapper) = @_;
no warnings 'redefine';
*$glob = $c_wrapper;
}
sub BOLD :ATTR {
my ($pkg,$glob,$ref) = @_;
wrap $glob => sub { "<b>" . $ref->() . "</b>" };
}
sub ITALIC :ATTR {
my ($pkg,$glob,$ref) = @_;
wrap $glob => sub { "<i>" . $ref->() . "</i>"};
}
sub hello :ITALIC :BOLD {
return "hello world";
}
say hello();
__END__
<b><i>hello World</i></b>
IMHO the Perl code is already better readable and can even be further improved with more syntactic sugar.
Not every decorator really needs to wrap code, abstracting it into a function wrap from an imported module seems reasonable (Python-folks always return the identical function-ref to handle this)
My friend was impressed, but maybe he wasn't critical enough.
Why don't we adapt (i.e. steal) good use cases from Python or LISP
Actually I wanted to meditate more about the issue and refine the code before posting, but I'm quite busy at the moment and the risk to forget this task in a drawer is quite high.
So following the release often paradigm I just posted my raw results...
At least I hope you have now good arguments, if someone claims Python was superior because of decorators!
I'm posting this for all monks to follow up on. I think our community input should be added to the overall chorus. The entire posting can be found on the Perl Weekly by Gabor Szabo at http://perlweekly.com/archive/96.html
The argument against CGI is that is no longer best practice. In my option that may be very true as the code is old. I my self do not use half of the functions in it (I use the just the $q->param() mostly) and I never use it to generate my HTML (I use Templet::Toolkit). I agree with the post by chromatic (see above link) that CGI should be re-written to conform with the new Perl Best Practices.
I want to meditate on some of the most useful perl-isms that, while easy, are oft misunderstood for beginners. I say that, and am possibly projecting, since when I was a beginner, I had not grokked them and had misused them. I also want to give kudos to the comments below as they have greatly helped refine this posting.
Map, Grep and Sort are not the same thing, but are often used together. Flowing from right to left, they act like shell pipelining in reverse. They build something new, like a tiny little factory. While the original data structure is untouched, it should feel like a list is being transformed every step of the way.
Did you ever want to memorize the s/// option list? The full list (from perlop) is msixpodualgcer.
Here are some anagrams I found:
* discourage p/m/xl (petite, medium, extra-large)
* glamorised x cpu
* Proclaimed, "g sux!" (oft heard?)
* xl CPU ideograms (extra-large CPU)
* goru x misplaced ("guru is misplaced")
* ex-cpu marigolds
* dog pux miracles (poo or pukes)
Can anyone come up with something better?
-QM
--
Quantum Mechanics: The dreams stuff is made of