http://www.perlmonks.org?node_id=983808

Rahul Gupta has asked for the wisdom of the Perl Monks concerning the following question:

a

Replies are listed 'Best First'.
Re: regex optional word match
by moritz (Cardinal) on Jul 26, 2012 at 09:43 UTC
      I have tried this regular expression $_ =~ m/^REMOTE\s+\[(.*?)\]\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+\[(.*?)\]\s+(.*?)sec\s+(.*)MBytes\s+(.*)Mbits\/sec . it gives data from both strings.But <cd> String1:"REMOTE Mon Jul 16 21:49:33 2012 @@ ueh1 TNT 20490 1916 0.0- 1.0 sec 0.33 MBytes 2.74 Mbits/sec 6.056 ms 0/ 233 (0%)"; </c>having two values in last 6.056 ms and 0/ 233(0%) i have to make them optional. Please help me in this problem Thanx
Re: regex optional word match
by kcott (Archbishop) on Jul 26, 2012 at 12:53 UTC

    I wasn't entirely sure which of the square brackets you wanted to capture. Just swap instances of

    ( \[ [^]]+ \] ) \s+

    with

    \[ ( [^]]+ ) \] \s+

    (and vice versa) to suit.

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my $s1 = "REMOTE [Mon Jul 16 21:49:33 2012] @@ [ueh1] [TNT] [20490 +] [1916] 0.0- 1.0 sec 0.33 MBytes 2.74 Mbits/sec 6.056 ms 0/ +233 (0%)"; my $s2 = "REMOTE [Mon Jul 16 21:49:34 2012] @@ [pdn1] [SSH] [20499 +] [3] 1.0- 2.0 sec 0.34 MBytes 2.86 Mbits/sec"; my @strings = ($s1, $s2); my $re = qr{ \A REMOTE \s+ \[ ( [^]]+ ) \] \s+ ( @@ ) \s+ ( \[ [^]]+ \] ) \s+ ( \[ [^]]+ \] ) \s+ ( \[ [^]]+ \] ) \s+ \[ ( [^]]+ ) \] \s+ ( .+? ) \s+ sec \s+ ( .+? ) \s+ MBytes \s+ ( .+? ) \s+ Mbits/sec (?> \s+ ( .+? ) \s+ ms \s+ ( .* ) | ) \z }x; for (@strings) { say for ('-' x 60, $_, '-' x 60); m{$re}; say for grep { defined } ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, + $11); }

    Output:

    $ pm_str_remote_re.pl ------------------------------------------------------------ REMOTE [Mon Jul 16 21:49:33 2012] @@ [ueh1] [TNT] [20490] [1916] +0.0- 1.0 sec 0.33 MBytes 2.74 Mbits/sec 6.056 ms 0/ 233 (0%) ------------------------------------------------------------ Mon Jul 16 21:49:33 2012 @@ [ueh1] [TNT] [20490] 1916 0.0- 1.0 0.33 2.74 6.056 0/ 233 (0%) ------------------------------------------------------------ REMOTE [Mon Jul 16 21:49:34 2012] @@ [pdn1] [SSH] [20499] [3] 1.0 +- 2.0 sec 0.34 MBytes 2.86 Mbits/sec ------------------------------------------------------------ Mon Jul 16 21:49:34 2012 @@ [pdn1] [SSH] [20499] 3 1.0- 2.0 0.34 2.86

    -- Ken

      Thanx, it worked for me :) :)

Re: regex optional word match
by brx (Pilgrim) on Jul 26, 2012 at 11:38 UTC

    update: input data changed by OP (backets added) - consider kcott's answer : Re: regex optional word match

    -*-*-*-*-

    If you really-really want to use a big regex, see:

    $line =~ m/^REMOTE\s+(.*?)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*?)\s+( +.*?)sec\s+(.*)MBytes\s+(.*)Mbits\/sec(.*)/s; print "{$10}"; #$10 should contain at least "\n" except if you 'ch +omp $line'
    This regex become very complex if 'MBytes' could be 'Bytes', etc.

    But here is another way with split (because lines seem to be well formated with "space-separator").

    #!perl use strict; use warnings; while (my $line = <DATA>) { chomp $line; my @parts = split /\s+/,$line; #REMOTE Mon Jul 16 21:49:33 2012 @@ ueh1 TNT 20490 1916 0.0- 1.0 sec 0 +.33 MBytes 2.74 Mbits/sec 6.056 ms 0/ 233 (0%) #0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 +4 15 16 17 18 19 20 21 22 next if $parts[0] ne 'REMOTE'; print "IN: $line\n"; print "\tOUT: ",join " ",@parts[1..5,8,12,13]; print " ", join " ",@parts[18,19] if $#parts>=19; print "\n"; } __DATA__ REMOTE Mon Jul 16 21:49:33 2012 @@ ueh1 TNT 20490 1916 0.0- 1.0 sec 0. +33 MBytes 2.74 Mbits/sec 6.056 ms 0/ 233 (0%) REMOTE Mon Jul 16 21:49:34 2012 @@ pdn1 SSH 20499 3 1.0- 2.0 sec 0.34 +MBytes 2.86 Mbits/sec LOCAL Mon Jul 16 21:49:34 2012 @@ nada

    Output:

    IN: REMOTE Mon Jul 16 21:49:33 2012 @@ ueh1 TNT 20490 1916 0.0- 1.0 se +c 0.33 MBytes 2.74 Mbits/sec 6.056 ms 0/ 233 (0%) OUT: Mon Jul 16 21:49:33 2012 TNT 1.0 sec 6.056 ms IN: REMOTE Mon Jul 16 21:49:34 2012 @@ pdn1 SSH 20499 3 1.0- 2.0 sec 0 +.34 MBytes 2.86 Mbits/sec OUT: Mon Jul 16 21:49:34 2012 SSH 2.0 sec

    update: /s modifier in regex

    English is not my mother tongue.
    Les tongues de ma mère sont "made in France".