Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Parenthesis grouping into regexes.

by aramisf (Beadle)
on Apr 25, 2012 at 20:58 UTC ( [id://967167]=perlquestion: print w/replies, xml ) Need Help??

aramisf has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone!

I was writing a script to filter ip addresses. And used two (simple and incomplete) regexes:

my $ip4 = "([0-9]{1,3}\.){3}[0-9]{1,3}"; my $ip6 = "([a-f0-9]{4}\:)+?"; my $prefix = "/\d\d";

They aren't (and don't need to be) fully correct. I understand that these regexes match an invalid ips like 999.888.777.666, but the ips I have on the list are ok.

My question resides here:

my $pattern = "^($ip4|$ip6)($prefix)?";

It is intented to catch and ip address, followed or not by a prefix. But the resulting match does not catch the prefix:

open (FILE, "<$some_file") or die "Error: $!\n"; while (<FILE>) { next if !~ m[$pattern]; print "Hey, I found \$1: $1 \$2: $2 \$3: $3\n"; } close FILE;

I don't understand how could one use the groups inside two or three levels into parenthesis.

You see, the $pattern variable looks this ugly:

^(([0-9]{1,3}\.){3}[0-9]{1,3}|([a-f0-9]{4}\:)+?)(/\d\d)?

Considering this situation, how the matches are set into $1, $2... variables?

Is there a way to set the ip address into $1 and the $prefix into $2 ?

If possible, I'd like to use $ip4 instead of:

"[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}"


### Update ###

There was a little problem about my english, sorry about that. Where I typed 'mask' I wanted to mean 'prefix' (the '/\d\d' part of an ip address). Maybe this is what Kenosis asked (if I understand the question).

Replies are listed 'Best First'.
Re: Parenthesis grouping into regexes.
by JavaFan (Canon) on Apr 25, 2012 at 22:03 UTC
    my $prefix = "/\d\d";
    Here's your problem. You're using \d in double quoted context. $prefix is just /dd. use warnings would have told you so. Printing $prefix would have told you as well.
      Yes, you're right. I modified that, but the problem persisted.
      I'll comment below...
Re: Parenthesis grouping into regexes.
by petdance (Parson) on Apr 26, 2012 at 01:06 UTC
    This is a well-solved problem. Suggest you check out Regexp::Common.

    xoxo,
    Andy

      Regexp::Common doesn't support IPv6 addresses though. Regexp::IPv6 does, however it creates quite a monster just to match the third form from RFC 4291 section 2.2 (mixed hex/decimal) that I've never seen used anywhere. This is (supposedly, I haven't tested it) a much shorter version:

      qr/^(((?=(?>.*?::)(?!.*::)))(::)?(([0-9A-F]{1,4})::?){0,5}|((?5):){6})(\2((?5)(::?|$)){0,2}|((25[0-5]|(2[0-4]|1[0-9]|[1-9])?[0-9])(\.|$)){4}|(?5):(?5))(?<![^:]:|\.)\z/i
      Pretty nice library. Didn't know it.
      Very useful indeed.

      Thanks for the tip.
Re: Parenthesis grouping into regexes.
by BillKSmith (Monsignor) on Apr 25, 2012 at 22:10 UTC

    You can solve your immediate problem by using non-capturing parenthesis in the definitions of $ip4 and $ip6. Refer to the section (?pattern) in perlre

      Here is an example with data made up to match your regular expressions. Are you sure about the fourth colon in $ip6?

      use strict; use warnings; my $ip4 = qr !(?: [0-9]{1,3}\.){3} [0-9]{1,3} !x; my $ip6 = qr !(?: [a-f0-9]{4}:)+ !x; my $prefix = qr ! / \d \d !x; my $pattern = qr ! ^($ip4|$ip6) ($prefix)? !x; while (<DATA>) { next if $_ !~ $pattern; print "Hey, I found \$1: $1 \$2: $2\n"; } __DATA__ no match here 999.888.777.666/66 aaaa:aaaa:aaaa:aaaa:/77
        Labelling groups (with (?pattern) regex) showed me what was happening. It is all clear now.

        Many thanks to everyone, specially BillKSmith!

        I was having a trouble when using $1, $2... because I didn't know they exact behaviour.

        The problem I had was with the parenthesis inside $ip4 regex, causing $2 to have an ip address,
        $3 had the match inside the parenthesis in $ip4, and the prefix I expected put into $4:
        # Example of my debug output: line = '*> 177.101.16.0/21 200.19.74.230 0 200' matches: $1:'*> ' $2:'177.101.16.0' $3: '16.' $4:'/21'

        Now I understand how $1, $2, $3... are set.
        Thank you all, Monks! =D

Re: Parenthesis grouping into regexes.
by Kenosis (Priest) on Apr 25, 2012 at 23:21 UTC

    Just curious... What are you filtering IP address from (and I don't mean the file)?

      I'm not sure if I understand the question correctly.

      I have a file, with lots of ip addresses. I filter ip patterns from each line.

      Is that what you asked?

        Hi, aramisf.

        Was just curious about how those IP beasties were embedded in your text, as that might help with extracting them.

        Hope you've found a viable solution within these suggestions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://967167]
Approved by Old_Gray_Bear
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-03-29 11:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found