If you've discovered something amazing about Perl that you just need to share with everyone, this is the right place.

This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)

Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").

User Meditations
The problem of "the" default shell
4 direct replies — Read more / Contribute
by afoken
on Dec 09, 2017 at 08:17

    I've got a little bit tired of searching my "avoid the default shell" postings over and over again, so I wrote this meditation to sum it up.

    What is wrong with the default shell?

    In an ideal world, nothing. The default shell /bin/sh would have a consistent, well-defined behaviour across all platforms, including quoting and escaping rules. It would be quite easy and unproblematic to use.

    But this is the real world. Different platforms have different default shells, and they change the default shell over time. Also, shell behaviour changed over time. Remember that the Unix family of operating systems has evolved since the 1970s, and of course, this includes the shells. Have a look at "Various system shells" to get a first impression. Don't even assume that operating systems keep using the same shell as default shell.

    And yes, there is more than just the huge Unix family. MS-DOS copied concepts from CP/M and also a very little bit of Unix. OS/2 and the Windows NT family (including 2000, XP, Vista, 7, 10) copied from MS-DOS. Windows 1-3, 9x, ME still ran on top of DOS. From this tree of operating systems, we got command.com and cmd.exe.

    By the way: Modern MacOS variants (since MacOS X) are part of the Unix family, and so is Android (after all, it's just a heavily customized Linux).

    Some ugly details:

    And when it comes to Windows (and DOS, OS/2), legacy becomes really ugly.

    So, to sum it up, there is no thing like "the" default shell. There are a lot of default shells, all with more or less different behaviour. You can't even hope that the default shell resembles a well-known family of shells, like bourne. So there is much potential for nasty surprises.

    Why and how does that affect Perl?

    Perl has several ways to execute external commands, some more obvious, some less. In the very basic form, you pass a string to perl that roughly ressembles what you would type into your favorite shell:

    • system('echo hello');
    • exec('echo hello');
    • open my $pipe,'echo hello |' or die "Can't open pipe: $!"; my $hello=do { local $/; <$pipe> }; close $pipe;
    • my $hello=qx(echo hello);
    • my $hello=`echo hello`;

    Looks pretty innocent, doesn't it? And it is, until you want to start doing real-world things, like passing arguments containing quotes, dollar signs, or backslashes to an external program. You need to know the quoting rule of whatever shell happens to be the default shell.

    For those cases, perl is expected to pass the string to /bin/sh for execution. Except that in this innocent case, and several other cases, perl does not invoke the default shell at all. Burried deep in the perl sources, there is some heuristics happening. If perl thinks that it can start the executable on its own, because the command does not contain what is documented as "shell metacharacters", perl splits the command on its own and can avoid invoking the default shell.

    Why? Because perl can easily figure out what the shell would do, and do it by itself instead. This avoids a lot of overhead and so is faster and does not use as much memory as invoking the shell would.

    Unfortunately, the documentation is a little bit short on details. See "Perl guessing" in Re^2: Improve pipe open? (redirect hook): From the code of Perl_do_exec3() in doio.c (perl 5.24.1), it seems that the word "exec" inside the command string triggers a different handling, and some of the logic also depends on how perl was compiled (preprocessor symbol CSH).

    If you don't need support from the default shell, you can help perl by passing system, exec, and open a list of arguments instead of a string. This "multi-argument" or "list form" of the commands always avoids the shell, and it completely avoids any need to quote.

    (Well, at least on Unix. Windows is a completely different beast. See Re^3: Perl Rename and Re^3: Having to manually escape quote character in args to "system"?. It should be safe to pretend that you are on Unix even if you are on Windows. Perl should do the right thing with the "list form".)

    So our examples now look like this:

    • system('echo','hello','here','is','a','dollar:','$');
    • exec('echo','hello','here','is','a','dollar:','$');
    • open my $pipe,'-|','echo','hello','here','is','a','dollar:','$' or die "Can't open pipe: $!"; my $hello=do { local $/; <$pipe> }; close $pipe;

    Did you notice that qx() and its shorter alias `` don't support a list form? That sucks, but we can work around that by using open instead. Writing a small function that wraps open is quite easy. See "Safe pipe opens" in perlipc.

    Edge cases

    OK, let's assume I've convinced you to use the list forms of system, exec, and open. You want to start a program named "foo bar", and it needs an argument "baz". Yes, the program has a space in its name. This is unusual but legal in the Unix family, and quite common on Windows.

    • system('foo bar','baz');
    • exec('foo bar','baz');
    • open my $pipe,'-|','foo bar','baz' or die ...

    or even:

    my @command=('foo bar','baz'); and one of:

    • system @command;
    • exec @command;
    • open my $pipe,'-|',@command or die ...

    All is well. Perl does what you expect, no default shell is ever involved.

    Now, "foo bar" get's an update, and you no longer have to pass the "baz" argument. In fact, you must not pass the "baz" argument at all. Should be easy, right?

    • system 'foo bar';
    • exec 'foo bar';
    • open my $pipe,'-|','foo bar' or die ...

    or:

    my @command=('foo bar'); and one of:

    • system @command;
    • exec @command;
    • open my $pipe,'-|',@command or die ...

    Wrong! system, exec, and even open in the three-argument form now see a single scalar value as the command, and start once again guessing what you want. And they will wrongly guess that you want to start "foo" with an argument of "bar".

    The solution for system and exec is hidden in the documentation of exec: Pass the executable name using indirect object syntax to system or exec, and perl will treat the single-argument list as list, and not a single command string.

    • system { 'foo bar' } 'foo bar';
    • exec { 'foo bar' } 'foo bar';

    or:

    my @command=('foo bar'); and one of:

    • system { $command[0] } @command;
    • exec { $command[0] } @command;

    If the command list is not guaranteed to contain at least two arguments (e.g. because arguments come from the user or the network), you should always use the indirect object notation to avoid this trap.

    Did you notice that we lost another way of invoking external commands here? There is (currently) no way in perl to use pipe open with a single-element command list without triggering the default shell heuristics. That's why I wrote Improve pipe open?. Yes, you can work around by using the code shown in "Safe pipe opens" in perlipc and using exec with indirect object notation in the child process. But that takes 10 to 20 lines of code just because perl tries to be smart instead of being secure.

    Avoiding external programs

    Why do you want to run external programs? Perl can easily replace most of the basic Unix utilities, by using internal functions or existing modules. And as an additional extra, you don't depend on the external programs. This makes your code more portable. For example, Windows does not have ls, grep, awk, sed, test, cat, head, or tail out of the box, and find is not find, but a poor excuse for grep. If you use perl functions and modules, that does not matter at all. Likewise, not all members of the Unix family have the GNU variant of those utilities. Again, if you use perl functions and modules, it does not matter.

    ToolPerl replacement
    echoprint, say
    rmunlink
    rm -rFile::Path
    mkdirmkdir
    mkdir -pFile::Path
    rmdirrmdir
    grepgrep (note: you need to open and read files manually)
    awka2p
    seds2p
    ls, findFile::Find, glob, stat, lstat, opendir, readdir, closedir
    test, [, [[stat, lstat, -X, File::stat
    cat, head, tailopen, readline, print, say, close, seek, tell
    lnlink, symlink
    chmodchmod
    chownchown
    touchutime
    curl, wget, ftpLWP::UserAgent and friends
    ftpNet::FTP
    sshNet::SSH2, Net::OpenSSH

    Note: The table above is far from being complete.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Better names for SCRIPT_NAME/PATH_INFO in a web framework?
2 direct replies — Read more / Contribute
by Dallaylaen
on Dec 02, 2017 at 13:42

    Hello dear esteemed monks,

    My toy framework called MVC::Neaf has crawled to 0.20 milestone. EDIT And it's got some misleading method names in it which I would like to correct.

    One thing I'm struggling to grasp is how to call requested path's fragments - the part that matched the current controller and whatever follows that path. The current convention is as follows:

    • script_name is the part of the path that matched current route, not the name of the script or the raw stuff from PSGI request.
    • path_info is anything that follows that matching part, for instance a wiki article name.
    • path_info_split is a newer version that takes regular expression capture groups into account.

    The overall /EDIT syntax, though still evolving, looks like follows now:

    use strict; use warnings; use MVC::Neaf qw(:sugar); get + post '/some/path' => sub { my $req = shift; my $foo = $req->param( foo ); my $bar = $req->param( bar => '\w+' ) # no params w/o validation or die 404; # render a "not found" page my ($from, $to) = $req->path_info_split; $req->script_name; # '/some/path' # frobnicate my $data = frobnicate( $bar, $from .. $to ); return { result => $data, foo => $foo, }; }, default => { -view => 'TT', # JS is the default which generates JSON -template => 'some_file.tt', title => 'My mega new application', version => '0.42', }, param_regex => { foo => '\d+' }, path_info_regex => '(\d+)/(\d+)'; # Hitting /some/path would trigger roughly something like follows # (assuming the controller doesn't die/redirect/stop otherwise) # $my_template->process( 'some_file.tt', { # title => ..., # version => ..., # result => ..., # # whatever else controller returned # } ); neaf->run;

    Some working examples exist in the distro.

    I know everyone else is using routes of form '/foo/:bar' these days but I don't really like it (although support may be added in the end). The reason is there are a number of formats (:foo, *bar, #baz) and they still don't cover all needed cases ([0-9]+ being the most obvious one).

    Now I feel like like these SCRIPT_NAME and PATH_INFO dating back into past century are awkward. My proposal is to rename this stuff completely:

    • $req->route() stands for the route that matched (Dancer does like that already).
    • $req->suffix() returns the URI capture groups if any were defined.
    • suffix => qr/(...)and(...)/ is the corresponding route parameter.
    • get '/article/(\d+)/(\d\d)/(\d\d)/(.*)' => ... - I'm also thinking about adding capture groups right into the path (w/o suffix parameter) but that'd be hard to undo so I'm holding it back.
    • param => \%spec is the predefined regular expression hash for $req->param( "name" ); (Neaf explicitly forbids unvalidated params and cookies, much like perl -T does).

    This scheme looks consistent and clear to me, but maybe I'm missing something. Does that look like a syntax you'd like to try out? What would you like to be added/removed? What is causing surprise and awkwardness here?

    Thank you,

contest problem about shortest Regular expression
2 direct replies — Read more / Contribute
by rsFalse
on Nov 28, 2017 at 16:11
    Hello.

    A week ago I met an interesting problem about regular expressions. It was from the OpenCup contest held in 2017 november 19. Here I will post a statement, and I ask to discuss how to solve this problem, what strategies and tools to use. I recommend to try to solve this problem by yourselves, and if you would like, you can copy pdf statements with this link, problem number 7 (You can download statement, while the link works, but you can't access and upload your solution for a testing system if you are not participating with your team from your high school). I used perlbrew switch v5.14.4 to match a version of testing system, and before I was failing with v5.18 because of some newer features which I used. At the bottom I will write my approaches and code.

    Statement: Here I will write ideas about solving.
    Firstly, I was thinking about two approaches: 1) to generate regex from an input data, e.g. to make a regex from the first input line, and later modify (expand) it when analysing next lines; 2) to generate all possible regexes, then sort them by length, then try to match all input lines against shortest regex, if fail then match against longer regexes.
    The first approach seems more sophisticated and I haven't found any ideas how to solve by that way. Can you suggest something? Second approach seems easier, and I successfully used it.
    Secondly, I realized that there are a regular expression which can match any possible line composed by only 4 distinct letters. So, I need to generate all regular expressions, which are shorter than all-matching-regex. Thirdly, I was thinking how to generate possible regexes. There were one idea at first. Later I got another. First idea was to generate all possible permutations of letters, and later add all possible permutations of other symbols into permutations of letters. This approach take much time for me, and it wasn't successful. Of course I tried to find some logics and avoid generating regular expressions which are longer than their shorter equivalents. Second idea was to generate regular expressions by joining smallest ones - only letters (say "atoms") - to the bigger and bigger ones. That expanding was performed by binary joining or by enveloping regex with Kleene star.
    Further are more ideas about solution, and the code.

    ... I was thinking that the type of "abstract" regexes which have backreferences (<named> for a comfort) can be useful. But I am not sure. I tried unsuccessfully.
    Maybe this problem can be solved with more advanced techniques: recursive regexes, eval-groups, and other?

    Related topic: Perl in programming contests and problem solving
    #regexes #qr #golf #problem #efficiency
    CODE: INPUT OUTPUT: OUTPUT verbose ($debug = 1;):
Rosetta Dispatch Table
7 direct replies — Read more / Contribute
by eyepopslikeamosquito
on Nov 21, 2017 at 16:14

    Ha ha, nysus just reminded me of an old interview question I used to ask. Implement a simple dispatch table.

    Let's start with a specification:

    • The key of the dispatch table is a string \w+
    • The name of the callback function is the key name with _callback appended
    • Each callback function takes a single string parameter and returns a positive number

    You must write the invoker function, which takes two arguments (the name and the string argument to be passed to the callback):

    • If the name is invalid (e.g. "fred" below), invoker must return a negative number
    • Otherwise, invoker must pass its second argument to the callback function and return what the callback function returns

    To clarify, here is a sample implementation.

    use strict; use warnings; # Callback functions --------------------------------------- sub first_callback { my $z = shift; print "in first_callback, z=$z\n"; return 1; } sub last_callback { my $z = shift; print "in last_callback, z=$z\n"; return 2; } # Implementation of dispatch table ------------------------- # (You need to write this code) my %op_table = ( first => \&first_callback, last => \&last_callback, ); sub invoker { my ($name, $z) = @_; exists($op_table{$name}) or return -1; $op_table{$name}->($z); } # Main program for testing --------------------------------- for my $name ( "first", "last", "fred" ) { my $rc = invoker( $name, $name . '-arg' ); print "$name: rc=$rc\n"; }

    Running the above test program produces:

    in first_callback, z=first-arg first: rc=1 in last_callback, z=last-arg last: rc=2 fred: rc=-1

    Points to consider:

    • Is a hash the recommended way to implement a dispatch table in Perl?
    • How many other ways can you think of to implement it in Perl? (working demonstration code would be good)
    For more fun, feel free to implement the above specification in another language of your choice.

Antiquitates - liber I - In memoriam Robert M. Pirsig
6 direct replies — Read more / Contribute
by Discipulus
on Nov 15, 2017 at 04:16

    Antiquitates - liber I - In memoriam Robert M. Pirsig

    Introduction

    This mediation is meant as the first of a short serie about antiquitates: good ancient things, knowledge. The central point of this is to focus on figures and ideas, or better pictures and schemas, that we have in our heads. This Pantheon is something we never speak about but is central in our approach to problems.

    Infact while in our past (i'm speaking of the western colture) this pantheon were very homogeneous (geographically speaking but also between distinct social classes) in the current, global and postmodern world is something very variegated and fragmented and almost each one has a pantheon on his own.

    Another point of this serie will be the importance of ancient wisdom. I totally disagree with the concept of human progress. I'm not speaking, obviously, of material conditions, but I believe the deepness reacheable by human thoughts has not improved over centuries. Only elements we play with have changed. Our fathers already discussed many still actual questions, useful also for us as programmers. I start here with an example in the near past, just to be kind with you, but other meditations will go far backward in time.

    In an era while the organisation of production is even more constrained into fixed binaries, where new methodologies are put in the field to force us to act in a precomputed manner, where technologies too reorganize themselves to be impersonal, becomes even more important to focus on which immaterial bricks are worth to be collected to build the unique construction of our creativity as programmers.

    Pirsig's Chautauquas

    Pirsig's recent death pushed me to ponder again about the importance of his discourse for my life and for my Perl programming activity. Pirsig is the author of the book Zen and the Art of Motorcycle Maintenance that I read many lives ago but which never disappeared from the background of my mind. Two central concepts still remain from his book as two rocks after thousand years of erosion.

    Quality

    The first one is quality. Quality, if memory deserves, was what caused the protagonist's brain short circuit as filosophy professor. The research of quality and implicitly his definition, is something central in our lives. Ok but what this can be related to Perl and to programing? Is not maintainability just an aspect of quality? Readability over a clever jumble of hacks is not just another face of this concept? And why we prefer, well we love, Perl if not for a matter of overall quality? Quality of the programmer's activity while coding, not constrained by the interpreter's laws to double sign with blood the laguage way to code. Quality of the produced code in all it's phases: imagination, drawing, realizing, improving, testing and maintaining.

    Quality is everyday something less. As the programmer hired after he told the interviewer he was able to cut ten lines of code each day. Quality is polishing the diamond.

    But quality cannot be teach. We, well you, can show some incarnation of it. You can cast some light from your own quality and draw a beautiful picture on the white wall of ignorance. You cannot show directly the source of these rays, just the projection. Why? because quality is not a place but a path, a neverending one. Is a driving tension.

    Underlying form

    The second concept is underlying form. This is crucial concept for the programmer. We solve problems. Solutions must be aware of underlying forms. Problems are occurences of the reality in the platonic world of ideas and forms and even if such abstract world does not exists, it heavily concurs in our understanding and approaching of problems.

    Without the perception of something in the background our solution can just be a mere workaround or a color patch. A valid solution to a given, complex problem, can born only from the understanding of underlying forms.

    Here the discourse becomes even more actual. We are living in the end of the firts Internet generation. Most of us have born in a totally analogic world and will die, as late as possible, in a totally different world. What about the next generation? Who knows? An anecdote, inspired by real life, can show different aproaches to the same problem.

    Generational anecdote

    Tizio and Caio are both of the first internet generation. They share the usage of computer at home. Tizio, for fun, put an entry in the file HOSTS for the Caio's preffered website pointing to 127.0.0.1

    Some hours after Caio points his browser to the preferred website and he notices a connection error. He then issues a ping to the website name and sees it strangely resolves to localhost. He wonders a bit if the provider has blocked the website so he issues an nslookup to the website domain name and he is happy to see that nslookup returns a valid public IP address. He points the browser to this address and he sees his preferred website again. So he realizes the problem must be at level of local name resolution. He opens the HOSTS file and comment the incriminated entry, adding some bad words addressed to Tizio.

    Tizio make the same funny pun in a computer shared with Sempronio, a millennial second internet generation. Sempronio notices the error then tries some other websites and they are ok. Sempronio starts thinking that something is broken within the browser and he installs another browser but, with his big disappoint, the problem persists.
    So he opens the Bag Of All Answers website and searches for "preferred_website blocked" and he discovers a plethora of causes that can make a website to be blocked. He reads superficially a bunch of articles without invastigating why governments block websites. After three minutes he modifies the search: "preferred_website blocked solution" and he happily discovers that some software can circumvent the problem.
    So he installs a program named after the sligthly modified name of an ancient god, let say Marz. Sempronio does not know but this geeky program uses it's own nameservers and redistributes web requests over it's own network in a peer2peer way, Sempronio just complains it's a bit slow but finally the preferred website shows correctly again in the browser.
    So for him it is a happy end (not saying that an international agency intercepts all Marz's traffic and programtically breaks all computers using it, but this is whole another story..).

    With the above i dont mean all young people are stupid and all older ones are wise and spot everytime the rigth solution. I dont mean this at all. Just I want to highlight that if you know the underlying form of a web request probably you'll arrive to correct conclusions if experiencing some weird browsing behaviour.

    Conclusions

    Quoting from Pirsig's book: Although motorcycle riding is romantic, motorcycle maintenance is purely classic. Author's work is full of examples of what he called classical and romantic types. I think such dichotomy is even too much stressed. I'm more in favor of the Humanistic Being, even as programmer. I dont want to be just another thooth in the gear not even knowing if I'm part of a clock or of a motorcycle. Knowing the big picture and to perceive underlying forms can make us better programmers.
    An old book by a philosofy professor is worth to read, probably even better to have in the books shelf than an aseptic manual of programing methodology.

    Roma 2770 AB URBE CONDITA / 8644 September 1993

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
A small Deity AI class system
4 direct replies — Read more / Contribute
by holyghost
on Nov 05, 2017 at 09:49
    ### Copyright (C) The Holy Ghost 2017 ###This program is released under the GPL 3.0 and artistic license 2.0 +. package HollyGameAI::AIInterface; our @ISA = "Interface"; sub AIInterface { my $class = shift; $self = $class->SUPER::Interface(qw(@_)); ### e.g. qw(swim fly +) bless $self, $class; } ### Copyright (C) The Holy Ghost 2017 ###This program is released under the GPL 3.0 and artistic license 2.0 +. package HollyGameAI::Factory; our @ISA = "HollyGameAI::Interface"; sub Factory { my $class = shift; my $self = $class->SUPER::Interface(@_); ### include abstract method names return bless $self, $class; } ### Copyright (C) The Holy Ghost 2017 ###This program is released under the GPL 3.0 and artistic license 2.0 +. ### with thanks to gregorovius from perlmonks package HollyGameAI::Interface; use Carp; sub Interface { my $class = shift; my $self = {inheritors => (), abstract_methods => shift, ### e.g. qw(swim fl +y) }; bless $self, $class; } sub import { my $method = caller; push (@{ $self->{inheritors}}, $method); } sub INIT { my $bad = 0; for my $class ($self->{inheritors}) { for my $meth ($self->{abstract_methods}) { no strict 'refs'; unless (defined &{"${class}::$meth"}) { $bad = 1; warn "HolyGameAI : Class $class should + implement HolyGameAI Interface but does not define $meth.\n"; } } } croak "HollyGameAI : Source compilation aborted at interface b +inding time\n" if $bad; } ### Copyright (C) The Holy Ghost 2017 ###This program is released under the GPL 3.0 and artistic license 2.0 +. package HollyGameAI::MutualExclusiveAI; use lib "../HollyGameAI"; use Factory; sub MutualExclusiveAI { my $class = shift; my $self = { aiclass => HollyGameAI::Factory->Factory(@_) }; bless $self, $class; } ### Copyright (C) The Holy Ghost 2017 ###This program is released under the GPL 3.0 and artistic license 2.0 +. package HollyGameAI::MutualExclusiveDeityAI; our @ISA = "MutualExclusiveAI"; sub MutualExclusiveDeityAI { my $class = shift; my $self = $self->SUPER::MutualExclusiveAI(qw(cast donate swim + fly empower)); bless $self, $class; } ### Copyright (C) The Holy Ghost 2017 ###This program is released under the GPL 3.0 and artistic license 2.0 +. package HollyGameAI::RNG; ### Random Number God, dice class sub RNG { my $class = shift; my $self = { dx => 0 }; return bless $self, $class; } sub set { my ($self, $dxx) = @_; $self->{dx} = $dxx; } sub rollDX { my $self = shift; return rand($dx); } sub rollD1 { my $self = shift; return rand(1); } sub rollD3 { my $self = shift; return rand(3); } sub rollD6 { my $self = shift; return rand(6); } sub rollD10 { my $self = shift; return rand(10); } sub rollD20 { my $self = shift; return rand(20); } sub rollPreviousDX { my $self = shift; return rand($self->{dx}); } sub roll { my ($self, $dxx) = shift; $self->set($dxx); given ($self->{dx}) { when ($_ = 0) { return 0; } when ($_ == 1) { return rollD1; } when ($_ == 3) { return rollD3; } when ($_ == 6) { return rollD6; } when ($_ == 10) { return rollD10; } when ($_ == 20) { return rollD20; } } return 0; }
The Perl Paradox
3 direct replies — Read more / Contribute
by reisinge
on Oct 29, 2017 at 07:20

    An interesting meditation by Tom Radcliffe of ActiveState.


    In general, they do what you want, unless you want consistency. -- perlfunc
[Perl 6]: Small discoveries VII, Flattening
2 direct replies — Read more / Contribute
by holli
on Oct 28, 2017 at 19:16
    Perl 6 tries to flatten lists. This:
    my $a = [[["hello"]]]; #not a 3d array!, same as: ["hello"]
    is not what you might think it is, as single element lists get flattened. To get what you mean you must write
    my $a = [[["hello"],],]; #now it is!
    Note the trailing comma. See also 2015 The Year of The Great List Refactor.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Code Structure Changes
8 direct replies — Read more / Contribute
by Anonymous Monk
on Oct 27, 2017 at 15:17
    You are usually hired to change the code to achieve a change in functionality. You begin to think that if the code is written in a different way, this type of problem could easily be solved. Having short of time, you always try to change the code and commit it, but the idea that you have power to change the structure of code and you should do it, keeps bothering you. What you should do?
Variable-Width Lookbehind (hacked via recursion)
1 direct reply — Read more / Contribute
by haukex
on Oct 24, 2017 at 13:43

    Warning: Since this uses recursion it is horribly inefficient and may easily blow up on longer strings. If you think you need this for variable-width lookbehind, then first think about how you might solve this with other techniques like lookahead, which is variable-width out of the box, or simply with multiple regular expressions. /Warning The following is presented as a curiosity as the result of the discussion here - thank you LanX and QM for providing the inspiration :-)

    Zero-width Lookaround Assertions are incredibly useful, but unfortunately the lookbehind assertions (?<=pattern) and (?<!pattern) are restricted to fixed length lookbehinds, and sometimes you just really want to be able to say something like e.g. (?<=ab+.*)c. With the following technique, you can emulate these kinds of variable-width lookbehind assertions.

[Perl 6]: Small discoveries VI, die
2 direct replies — Read more / Contribute
by holli
on Oct 19, 2017 at 18:41
    In Perl 6, die still prints an error to STDERR and exits (unless caught), however adding a newline to the end of the error message will produce a stack trace.

    The idiom for printing an error message and stopping the program in Perl 6 is:
    note "Error Message" and exit 42; # or some other number (except 0)


    holli

    You can lead your users to water, but alas, you cannot drown them.
[Perl 6]: Small discoveries V, True / False / FileNotFound
1 direct reply — Read more / Contribute
by holli
on Oct 19, 2017 at 13:54
    Omg, I love this. Did you ever have a clear, slick little function that needs to return a boolean, and you also want to communicate an error condition? You basically have the choice of returning two values, reversing the consuming condition (meaning an empty return value be considered true), or using a string reference as an argument to the function.

    Witness Perl 6:
    sub slick() { if do-stuff { return "SomeValue"; } else { return "Some error message" but False; } } if my $result = slick { process( $result ); } else { log-error( $result ); }


    holli

    You can lead your users to water, but alas, you cannot drown them.
Be prepared for CSV injections in spreadsheet
3 direct replies — Read more / Contribute
by Tux
on Oct 18, 2017 at 07:34

    Read this article to get an idea of how dangerous it can be to blindly accept macro's in spreadsheets. Be it MS Excel or Google spreadsheets, they all suffer.

    You cannot blame CSV for it. CSV is just passive data.

    Once you load or open a CSV file into something dangerous as a spreadsheet program that allows formula's to be execcuted on open, all bets are off. Or are they?

    The upcoming Text::CSV_XS has added a new feature to optional take actions when a field contains a leading =, which to most spreadsheet programs indicates a formula.

    On both parsing and generating CSV, you will be able to specify what you want to do (where "formula" does not go beyond the fact that the field starts with a =):

    • Do nothing special (default behavior) and leave the text as-is
    • Die whenever a formula is seen
    • Croak when a formula is seen
    • Give a warning where a formula is seen
    • Replace all formulas with an empty string
    • Remove all formulas (replace with undef

    Code speaks loader than words ...

    I'm pretty pleased with the diagnostics

    $ cat formula.csv a,b,c 1,=2+3,4 6,,7,=8+9, $ perl -MCSV -e'$_ = dcsv (in => "formula.csv", bom => 1, formula => " +diag")' Field 2 (column: 'b') in record 1 contains formula '=2+3' Field 4 in record 2 contains formula '=8+9'

    Expect this to be available by next week.


    Enjoy, Have FUN! H.Merijn
Perl6 discoveries ó floating-point
2 direct replies — Read more / Contribute
by Grimy
on Oct 18, 2017 at 06:59
    Anonymous Monk brought up a really interesting discovery here. Unfortunately, that thread got derailed, so Iím making a separate one, as suggested by Your Mother. One of the first things I found while testing is this really interesting tidbit:
    $ perl6 -e 'say 0.99999999999999999000001' 1.000000000000000073886090 $ perl6 -e 'say 0.99999999999999999000001 > 1' True
    But then I realized I was using an outdated Rakudo (2017.04). So I updated to 2017.09, and now those print 1 and False, respectively. Thereís still some interesting behavior in 2017.09, though:
    $ perl6 -e 'say 0.7777777777777777777770' 0.77777777777777785672697 $ perl6 -e 'say 0.7777777777777777777771' 0.777777777777777767909129
    Note that the second number printed is strictly smaller than the first one, even though the second source number is strictly larger than the first one, spelled in the same fashion and to the same number of significant digits! However, comparison and subtraction still return exact results:
    $ perl6 -e 'say 0.7777777777777777777771 > 0.7777777777777777777770' True $ perl6 -e 'say 0.7777777777777777777771 - 0.7777777777777777777770' 1e-22
    Okay, thatís probably because one is a Num and the other is a Rat, so letís convert everything to Num explicitly:
    $ perl6 -e 'say Num(0.7777777777777777777770)' + 0.777777777777778 $ perl6 -e 'say Num(0.7777777777777777777771)' 0.777777777777778 $ perl6 -e 'say Num(0.7777777777777777777770) > Num(0.7777777777777777 +777771)' True $ perl6 -e 'say Num(0.7777777777777777777770) - Num(0.7777777777777777 +777771)' 1.11022302462516e-16
    Huh. Now they print the same, but theyíre still different numbers when compared. Note that the sign of the difference got switched:
    $ perl6 -e 'my $a = 0.7777777777777777777770; my $b = 0.77777777777777 +77777771; say $a <=> $b; say Num($a) <=> Num($b)' + Less More
    Also interesting is that many Nums donít survive a round-trip to Str:
    $ perl6 -e 'my $a = Num(1/9); say $a == Num(Str($a))' False
    Can anyone point me to the Perl6 specs/docs/whatever that explain those behaviors?
Parsing HTML/XML with Regular Expressions
8 direct replies — Read more / Contribute
by haukex
on Oct 16, 2017 at 07:48

    Your employer/interviewer/professor/teacher has given you a task with the following specification:

    Given an XHTML file, find all the <div> tags with the class attribute "data"1 and extract their id attribute as well as their text content, or an empty string if they have no content. The text content is to be stripped of all non-word characters (\W) and tags, text from nested tags is to be included in the output. There may be other divs, other tags, and other attributes present anywhere, but divs with the class data are guaranteed to have an id attribute and not be nested inside each other. The output of your script is to be a single comma-separated list of the form id=text, id=text, .... You are to write your code first, and then you will be given a test file, guaranteed to be valid and standards-conforming, for which the expected output of your program is "Zero=, One=Monday, Two=Tuesday, Three=Wednesday, Four=Thursday, Five=Friday, Six=Saturday, Seven=Sunday"2.

    Updates - Clarifications:
    1 The class attribute should be exactly the string data (that is, ignoring the special treatment given to CSS classes). Examples below updated accordingly.
    2 Your solution should be generic enough to support any arbitrary strings for the id and text content, and be easily modifiable to change the expected class attribute.

    Ok, you think, I know Perl is a powerful text processing language and regexes are great! And you write your code and it works well for the test cases you came up with. ... But did you think of everything? Here's the test file you end up getting:

    I encourage everyone to try and write a parser using your favorite module, be it:

    Honorable mentions: Grimy for a regex solution and RonW for a regex-based parser :-)

    I'll kick things off with Mojo::DOM (compacted somewhat, with potential for a lot more golfing or verboseness):

    Update 2017-10-18: Thank you very much to everyone who has replied and posted their solutions so far, keep em coming! :-)


Add your Meditation
Title:
Meditation:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":


  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.