Using variable to hold regex expression

by salatconed (Initiate)
on Mar 11, 2013 at 23:07 UTC
salatconed has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use variables to hold the regex exppression to make it easier to code, and I'm running into one issue parsing firewall logs.

When I call the same regex variable multiple times the first group returns the correct result but the second one shows part of the first IP address.

-- Sample data Mar 10 07:42:38 DR-FW-1 : %ASA-6-305011: Built dynamic UDP translation from inside: to outside(internet-traffic):

output: re1 -> re2 -> 17. ------------------------------------------ my $Raw_Log = ""; my $re_ipv4 = qr/(([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9])[.]){3}( +([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9]))/; # Open file to read lines my $logfile = $ARGV[0]; my $linenum = 0; open(LOGFILEHD, $logfile); while( <LOGFILEHD>){ $Raw_Log = $_; print "$Raw_Log\n"; $Raw_Log =~ /($re_ipv4).*($re_ipv4)/; print "re1 -> $1\n"; print "re2 -> $2\n"; $linenum += 1; } close(LOGFILEHD);

Re: Using variable to hold regex expression
on Mar 11, 2013 at 23:12 UTC
    $1 Corresponds to the first opening capturing parenthesis, $2 corresponds to the second one. You probably want to use $5 instead of $2 - let us count:
    ((([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9])[.]){3}(([2]([0-4][0-9]| +[5][0-5])|[0-1]?[0-9]?[0-9]))).*( 123 4 56 7
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Using variable to hold regex expression
on Mar 11, 2013 at 23:36 UTC

    Have you considered using Regexp::Common::net to capture those IPs?

    use strict; use warnings; use Regexp::Common qw/net/; while (<DATA>) { if ( my ( $firstIP, $secondIP ) = /($RE{net}{IPv4})/g ) { print "FirstIP: $firstIP\nSecondIP: $secondIP\n\n"; } } __DATA__ Sample data Mar 10 07:42:38 DR-FW-1 : %ASA-6-305011: Built dynamic UDP + translation from inside: to outside(internet-traff +ic): Sample data Mar 10 07:42:38 DR-FW-1 : %ASA-6-305011: Built dynamic UDP + translation from inside: to outside(internet-traffic +):


    FirstIP: SecondIP: FirstIP: SecondIP:
      Perhaps using IP addresses was not a good example, I'm trying to figure out how to parse a string which has repetitive data, so I can write the regex expression once and get multiple returns if they exist the same way your code got both IP addresses in one call.
        ... how to parse a string which has repetitive data ...

        As choroba pointed out, every  (pattern) pair of parentheses in a regex captures something (even undef possibly) to its corresponding capture variable. One way to parse a string using nested regexes is avoid using a gazillion capturing groups. Use the non-capturing  (?:pattern) instead for grouping. See perlre, perlrequick, perlretut. In the IP example (but this should generalize to any repetitive data you wish to extract):

        >perl -wMstrict -le "my $decimal_octet = qr{ 2 (?: [0-4] \d | 5 [0-5]) | [01]? \d? \d }xms; my $ip = qr{ (?<! \d) $decimal_octet (?: \. $decimal_octet){3} (?! \d) }xms; print $ip; ;; my $s = ' xx yyy zz'; my @ips = $s =~ m{ $ip }xmsg; printf qq{'$_' } for @ips; " (?^msx: (?<! \d) (?^msx: 2 (?: [0-4] \d | 5 [0-5]) | [01]? \d? \d ) (? +: \. (?^msx: 2 (?: [0-4] \d | 5 [0-5]) | [01]? \d? \d )){3} (?! \d) ) '' ''

        Note that neither  (?:pattern) nor the  (?<!pattern) (?!pattern) look-around assertions capture. Indeed, nothing captures (to a capture variable) since data is extracted in list context directly to an array.

        If I'm understaing you correctly, the my ( $firstIP, $secondIP ) = /($RE{net}{IPv4})/g in the above code does what you've described.

