Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Matching numbers by regex.

by Wasted (Initiate)
on Apr 19, 2006 at 09:48 UTC ( [id://544308]=perlquestion: print w/replies, xml ) Need Help??

Wasted has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl # my $data = "Exlief 4 page : 1 /10"; if ($data =~ /pag\w+\s*:\s*(\d+).*(\d*\d+)/) { print "Pages : $1 / $2\n"; } my $data = "Exlief 4 page : 1 / 5"; if ($data =~ /pag\w+\s*:\s*(\d+).*(\d*\d+)/) { print "Pages : $1 / $2\n"; }
Output is :
Pages : 1 / 0
Pages : 1 / 5

But should be:
Pages : 1 / 10
Pages : 1 / 5

Replies are listed 'Best First'.
Re: Matching numbers by regex.
by GrandFather (Saint) on Apr 19, 2006 at 10:01 UTC

    * is greedy - it matches as many characters as it can, but it can match none at all. In (\d+).*(\d*\d+) the \d* is redundant (the following \d+ matches at least 1 digit and as many as it may) and the .* before it matches as many charactes as it can including all except one digit (the \d+ grabs one digit). One way to fix the problem is:

    use strict; use warnings; my $data = "Exlief 4 page : 1 /10"; my $match = qr/pag\w+\s*:\s*(\d+)[^\d]*(\d+)/; print "Pages : $1 / $2\n" if $data =~ $match; $data = "Exlief 4 page : 1 / 5"; print "Pages : $1 / $2\n" if $data =~ $match;

    Prints:

    Pages : 1 / 10 Pages : 1 / 5

    Note that a precompiled regex is used to save retyping (perhaps differently) the regex and that the 'match any character' has been replaced by 'match any character except a digit' and that the redundant digit match has been removed.


    DWIM is Perl's answer to Gödel

      In the above code (and in the other replies in the thread),

      [^\d]*

      may be represented with

      \D*

      and will be more efficient as well, since it avoids calls to utf8::IsDigit internally.

      • another intruder with the mooring in the heart of the Perl

        Heh, good point! I do tend to forget the uppercase versions of the character set match flags such as \D \W \S. Thanks for the reminder.


        DWIM is Perl's answer to Gödel
Re: Matching numbers by regex.
by prasadbabu (Prior) on Apr 19, 2006 at 09:56 UTC

    Here is one way to do it. In your coding you have used unnecessary greediness. You have to take a look at perlre

    my $data = "Exlief 4 page : 1 /10"; if ($data =~ /pag[^:]*:\s*(\d+)[^\d]*(\d*)/) { print "Pages : $1 / $2\n"; } my $data = "Exlief 4 page : 1 / 5"; if ($data =~ /pag[^:]*:\s*(\d+)[^\d]*(\d*)/) { print "Pages : $1 / $2\n"; } output: Pages : 1 / 10 Pages : 1 / 5

    Prasad

Re: Matching numbers by regex.
by Samy_rio (Vicar) on Apr 19, 2006 at 09:56 UTC

    Hi Wasted, Just try this,

    #!/usr/bin/perl my $data = "Exlief 4 page : 1 /10"; if ($data =~ /pag\w+\s*:\s*(\d+)[^\d]*(\d+)/) { print "Pages : $1 / $2\n"; } my $data = "Exlief 4 page : 1 / 5"; if ($data =~ /pag\w+\s*:\s*(\d+)[^\d]*(\d+)/) { print "Pages : $1 / $2\n"; } __END__ Pages : 1 / 10 Pages : 1 / 5

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

Re: Matching numbers by regex.
by jonadab (Parson) on Apr 19, 2006 at 12:51 UTC

    As others have noted, the greediness of .* is your problem. However, they all seem to want to fix it by making it not match any digits, which seems odd, since the problem isn't that it can match digits, but rather that it is greedy. I would just change .* to .*? to make it non-greedy. The results will be just about the same in this particular instance, however.


    Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.
Re: Matching numbers by regex.
by japhy (Canon) on Apr 19, 2006 at 13:48 UTC
    What was the purpose of \d*\d+ in the regex?

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      I thought that it behaved the opposite to greedy. just the \d+ was there but only matching the last digit.
      So I thought I could force it to accept the first digit of the last number by adding a \d*.
      That didn't work so I came here for help, I'm new to this whole regexp business.
Re: Matching numbers by regex.
by CountZero (Bishop) on Apr 19, 2006 at 21:15 UTC
    Just trying it a little bit differently: $data =~{(\d+)\s*/\s*(\d+)}

    This will give you the figures to the left and right of the slash. Whitespace may optionally separate the figures from the slash.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://544308]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 15:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found