Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Is there a more functional regex syntax?

by smls (Friar)
on Sep 18, 2012 at 15:29 UTC ( [id://994298]=perlquestion: print w/replies, xml ) Need Help??

smls has asked for the wisdom of the Perl Monks concerning the following question:

Consider the example of having as input a list of numbers, some of which are not known exactly but only known to be inside a given range (that is encoded in a string value).

The following code snippet adds up the given numbers, returning an upper and lower bound for the result:

#!/usr/bin/perl use strict; use warnings; my @ranges = ('15', '28-31', '3-4', '40', '17-19'); my ($total_min, $total_max); foreach my $range (@ranges) { my ($min) = $range =~ /^(\d+)/; my ($max) = $range =~ /(\d+)$/; $total_min += $min; $total_max += $max; } print "total is between $total_min and $total_max\n";

It works fine, but regarding the regex part, the need to

  • separate it into multiple statements
  • introduce temporary variables (here $min and $max)
always seriously bothers me in cases like this.

I would much prefer to be able to write the whole body of the of loop in the above example in the form:

$total_min += SELF_CONTAINED_FUNCTIONAL_STATEMENT; $total_max += SELF_CONTAINED_FUNCTIONAL_STATEMENT;

Is there an alternative syntax for string extraction using regexes that would allow this?

It's not about performance or such. It's about my brain receiving a nice dose of dopamin whenever I write a line of concise, functional, self-contained code - and the opposite when I can't.

----
PS: Even better would be the following, but unfortunately it seems that Perl's += operator does not work that way:

($total_min, $total_max) += STATEMENT_RETURNING_A_LIST_OF_TWO_NUMS;

Replies are listed 'Best First'.
Re: Is there a more functional regex syntax?
by BrowserUk (Patriarch) on Sep 18, 2012 at 16:26 UTC

    How's about:

    use List::Util qw[ sum ]; my @ranges = ('15', '28-31', '3-4', '40', '17-19'); my $tMin = sum map{ /^(\d+)/ } @ranges; my $tMax = sum map{ /(\d+)$/ } @ranges; print "$tMin : $tMax";; 103 : 109

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      I think this is the most readable solution for this specific problem posted so far, thanks for posting it.

      I have always shied away from List::Util and friends so far, but I'm starting to see its beauty...

        I have always shied away from List::Util ...

        Personally, I think List::Util should be incorporated into the core; so often do I use it in my programs.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re: Is there a more functional regex syntax?
by tobyink (Canon) on Sep 18, 2012 at 15:54 UTC

    I think this is the prettiest I could make it...

    #!/usr/bin/perl use strict; use warnings; my @ranges = ('15', '28-31', '3-4', '40', '17-19'); my @totals = (0, 0); foreach my $range (@ranges) { $totals[$_] += (split /-/, $range)[$_] for 0, -1; } print "total is between $totals[0] and $totals[-1]\n";

    ... it's arguably less readable though.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      Nice trick using the -1 index like that, I'll have to remember that...

      Regarding the original question though, I was hoping for a solution that keeps the regexes, because even if replacing them with split is possible in this case, that won't always be feasible if dealing with more complex regexes.

        Some regexes will work inside those parentheses.

        $totals[$_] += ($range =~ /(\d+)/g)[$_] for 0, -1;
        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Is there a more functional regex syntax?
by kennethk (Abbot) on Sep 18, 2012 at 15:54 UTC
    What about something more like:
    $range =~ /^(\d+)(?:-(\d+))?$/; $total_min += $1; $total_min += $2//$1;
    I personally think holding onto $min and $max reads easier, but I can understand the desire for conciseness.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Or, as an improvement in the regex, /^(?=(\d+))(?:\d+-)?(\d+)$/, so that both $1 and $2 are always defined. So, as 1 line,
      $totals[$_] += ($range =~ /^(?=(\d+))(?:\d+-)?(\d+)$/)[$_] for 0,1;
      Or more simply, if you want to keep the variables separate,
      $total_min += ($range =~ /^(\d+)/)[0]; $total_max += ($range =~ /(\d+)$/)[0];

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Thank, this is what I've been looking for!

        It's not extremely pretty, but it should scale nicely for the general case of needing the result of a regex string extraction as a right-hand-side value.

        PS: The regex suggested by tobyink above can make the single-line case much more readable: /(\d+)/g

Re: Is there a more functional regex syntax?
by Arunbear (Prior) on Sep 18, 2012 at 16:19 UTC
    A functional approach (more for fun though):
    #!/usr/bin/perl use strict; use warnings; use List::Util qw(reduce); use Data::Dump 'pp'; my @ranges = ('15', '28-31', '3-4', '40', '17-19'); my ($total_min, $total_max) = @{ reduce { [$a->[0] + $b->[0], $a->[1] + $b->[1]] } map { @$_ == 1 and [($_->[0]) x 2] or $_ } map { [ split /-/ ] } @ranges }; pp ($total_min, $total_max); __DATA__ ('15', '28-31', '3-4', '40', '17-19') | v ([15], [28, 31], [3, 4], [40], [17, 19]) | v ([15, 15], [28, 31], [3, 4], [40, 40], [17, 19]) | v [103, 109]
Re: Is there a more functional regex syntax?
by AnomalousMonk (Archbishop) on Sep 18, 2012 at 16:54 UTC

    I, too, would be inclined toward something like the initial form of the code given in the OP for reasons of readability and maintainability. However, here's another way to glue everything together:

    >perl -wMstrict -le "my @ranges = ('15', '28-31', '3-4', '40', '17-19'); my ($total_min, $total_max); ;; m{ \A (\d+) (?{ $total_min += $^N }) (?: - (\d+))? (?{ $total_max += $^N }) \z }xmsg for @ranges; ;; print qq{total between $total_min and $total_max}; " total between 103 and 109

    Update:

    I would much prefer to be able to write the whole body of the of loop in the above example in the form:
        $total_min += SELF_CONTAINED_FUNCTIONAL_STATEMENT;
        $total_max += SELF_CONTAINED_FUNCTIONAL_STATEMENT;

    This seems to come a bit closer to what smls asked for (but I like BrowserUk's solution better!) and has a bit of input validation:

    perl -wMstrict -le "my @ranges = qw(15 28-31 3-4 40 17-19 99- -99 -99- x x-x); my ($total_min, $total_max); ;; my $extract_ranges = qr{ \A (\d+) (?: - (\d+))? \z }xms; for (@ranges) { $total_min += /$extract_ranges/ && $1; $total_max += /$extract_ranges/ && $^N; } ;; print qq{total between $total_min and $total_max}; " total between 103 and 109

    (Update: Now that I look back on this thread, my second approach looks rather like kennethk's first idea.)

Re: Is there a more functional regex syntax?
by rjt (Curate) on Sep 18, 2012 at 15:54 UTC

    I'm not exactly sure the expected output is for the cases where there is only one digit. I took a guess that the min = max in these cases, so if there is no second number, I use the first again.

    s|^(\d+)(\-(\d+))?$|$total_min += $1; $total_max += $3//$1|e for (@r +anges);
Re: Is there a more functional regex syntax?
by kcott (Archbishop) on Sep 19, 2012 at 07:34 UTC

    G'day smls,

    In the code below, I've eliminated both temporary variables and reduced the foreach block of code to a single line:

    $ perl -Mstrict -Mwarnings -e ' my @ranges = qw{15 28-31 3-4 40 17-19}; my ($total_min, $total_max); s{^(\d+)-?(\d*)$}{$total_min += $1; $total_max += $2 ? $2 : $1}e for @ +ranges; print "total is between $total_min and $total_max\n"; ' total is between 103 and 109

    -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://994298]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-03-19 05:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found