Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Using variable to hold regex expression

by salatconed (Initiate)
on Mar 11, 2013 at 23:07 UTC ( #1022892=perlquestion: print w/replies, xml ) Need Help??
salatconed has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use variables to hold the regex exppression to make it easier to code, and I'm running into one issue parsing firewall logs.

When I call the same regex variable multiple times the first group returns the correct result but the second one shows part of the first IP address.

-- Sample data Mar 10 07:42:38 DR-FW-1 : %ASA-6-305011: Built dynamic UDP translation from inside: to outside(internet-traffic):

output: re1 -> re2 -> 17. ------------------------------------------ my $Raw_Log = ""; my $re_ipv4 = qr/(([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9])[.]){3}( +([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9]))/; # Open file to read lines my $logfile = $ARGV[0]; my $linenum = 0; open(LOGFILEHD, $logfile); while( <LOGFILEHD>){ $Raw_Log = $_; print "$Raw_Log\n"; $Raw_Log =~ /($re_ipv4).*($re_ipv4)/; print "re1 -> $1\n"; print "re2 -> $2\n"; $linenum += 1; } close(LOGFILEHD);

Replies are listed 'Best First'.
Re: Using variable to hold regex expression
by choroba (Chancellor) on Mar 11, 2013 at 23:12 UTC
    $1 Corresponds to the first opening capturing parenthesis, $2 corresponds to the second one. You probably want to use $5 instead of $2 - let us count:
    ((([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9])[.]){3}(([2]([0-4][0-9]| +[5][0-5])|[0-1]?[0-9]?[0-9]))).*( 123 4 56 7
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Using variable to hold regex expression
by Kenosis (Priest) on Mar 11, 2013 at 23:36 UTC

    Have you considered using Regexp::Common::net to capture those IPs?

    use strict; use warnings; use Regexp::Common qw/net/; while (<DATA>) { if ( my ( $firstIP, $secondIP ) = /($RE{net}{IPv4})/g ) { print "FirstIP: $firstIP\nSecondIP: $secondIP\n\n"; } } __DATA__ Sample data Mar 10 07:42:38 DR-FW-1 : %ASA-6-305011: Built dynamic UDP + translation from inside: to outside(internet-traff +ic): Sample data Mar 10 07:42:38 DR-FW-1 : %ASA-6-305011: Built dynamic UDP + translation from inside: to outside(internet-traffic +):


    FirstIP: SecondIP: FirstIP: SecondIP:
      Perhaps using IP addresses was not a good example, I'm trying to figure out how to parse a string which has repetitive data, so I can write the regex expression once and get multiple returns if they exist the same way your code got both IP addresses in one call.
        ... how to parse a string which has repetitive data ...

        As choroba pointed out, every  (pattern) pair of parentheses in a regex captures something (even undef possibly) to its corresponding capture variable. One way to parse a string using nested regexes is avoid using a gazillion capturing groups. Use the non-capturing  (?:pattern) instead for grouping. See perlre, perlrequick, perlretut. In the IP example (but this should generalize to any repetitive data you wish to extract):

        >perl -wMstrict -le "my $decimal_octet = qr{ 2 (?: [0-4] \d | 5 [0-5]) | [01]? \d? \d }xms; my $ip = qr{ (?<! \d) $decimal_octet (?: \. $decimal_octet){3} (?! \d) }xms; print $ip; ;; my $s = ' xx yyy zz'; my @ips = $s =~ m{ $ip }xmsg; printf qq{'$_' } for @ips; " (?^msx: (?<! \d) (?^msx: 2 (?: [0-4] \d | 5 [0-5]) | [01]? \d? \d ) (? +: \. (?^msx: 2 (?: [0-4] \d | 5 [0-5]) | [01]? \d? \d )){3} (?! \d) ) '' ''

        Note that neither  (?:pattern) nor the  (?<!pattern) (?!pattern) look-around assertions capture. Indeed, nothing captures (to a capture variable) since data is extracted in list context directly to an array.

        If I'm understaing you correctly, the my ( $firstIP, $secondIP ) = /($RE{net}{IPv4})/g in the above code does what you've described.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1022892]
Approved by BrowserUk
[Corion]: (or maybe I just work better from existing code that I munge until it works and I understand it rather than a short abstract text like "implement everything that's needed" ;) )
[ambrus]: Corion: I think in this case you can get away with only a stub for idle, one that always dies when you create it, because AnyEvent::HTTP doesn't use it, not even indirectly through AnyEvent::Handle or AnyEvent::Socket or AnyEvent::DNS.
[Corion]: The "and I understand it" part is optional.
[Corion]: ambrus: Yes but I also need to implement the file / IO watcher, because Prima has that (in Prima::File), and I need to supply the appropriate thing to make push_write etc. work with Prima
[ambrus]: Corion: yes, you need to implement the io watcher, which should be simple because Prima::File is basically that, and the timer watcher form Prima::Timer
[Corion]: ... or so I think. As I said, I'm somewhat vague on how to make AnyEvent cooperate with a callback-driven IO event loop that gives me callbacks when data is available or can be written
[ambrus]: what push_write thing? I don't think you need that. that's implemented generically by AnyEvent::Handle
[Corion]: ambrus: Yeah, that's what I think as well. But you give me an idea, maybe I should start with implementing the timer, as that should be far simpler and with fewer edge-cases/nasty interaction than the file watcher
[ambrus]: You only provide the watcher part that tells when the handle is readable or writable, not the actual writing and reading.
[Corion]: ambrus: Hmmm. It makes sense that AnyEvent would implement the push_write itself, but I think I don't have a good idea of where the boundary between AnyEvent and the underlying event system lies... Implementing the timer should give me a better idea

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2016-12-08 12:17 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (141 votes). Check out past polls.