http://www.perlmonks.org?node_id=1015081

brad_nov has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have script like below:
#!/usr/local/bin/perl use strict; use warnings; while (<DATA>) { ( my ($s_id) = /^\d+\|(\d+?)\|/ ) ; if ( $s_id == 1 ){ s/^(.*\|)*.*ABC\.pi=([\d.]+|[\w.]+)*.*ABC\.id=(\d+|[\w.]+).*$/$1$2 +|$3/s; print "$1$2|$3\n"; } } __DATA__ 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.66~ABC.id= +789137136770 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.67~ABC.id= +789134713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.68~ABC.id= +789137213670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.69~ABC.id= +78913713670 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +78913713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +78913713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +789137135670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +789137153670 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713670 123|1|456464|645646|4546|654~abc~dhghga~121322~456466874~8796896 123|2|456464|645646|4546|654~abc~dhghga~121322~456466874~6788708 123|2|456464|645646|4546|654~abc~dhghga~121322~456466874~6806
When I am executing I am getting output as follows:
123|1|456464|645646|4546|112.33.44.55.66|789137136770 123|1|456464|645646|4546|112.33.44.55.67|789134713670 123|1|456464|645646|4546|112.33.44.55.68|789137213670 123|1|456464|645646|4546|112.33.44.55.69|78913713670 Use of uninitialized value $2 in concatenation (.) or string at split_ +test.pl line 14, <DATA> line 5. Use of uninitialized value $3 in concatenation (.) or string at split_ +test.pl line 14, <DATA> line 5. 1| 123|1|456464|645646|4546|112.33.44.55.70|78913713670 123|1|456464|645646|4546|112.33.44.55.70|78913713670 123|1|456464|645646|4546|112.33.44.55.70|789137135670 123|1|456464|645646|4546|112.33.44.55.70|789137153670
I am looking to get rid off the error. How can I do it? ANd I want to write the exceptions to new file.

Replies are listed 'Best First'.
Re: Print only if pattern matches
by Kenosis (Priest) on Jan 24, 2013 at 06:37 UTC

    Place your matching regex in an if statement. If true, you can print your captures w/o error. The else can handle the exceptions:

    use strict; use warnings; while (<DATA>) { next unless /^\d+\|(\d+?)\|/ and $1 == 1; if (/^(.*\|)*.*ABC\.pi=([\d.]+|[\w.]+)*.*ABC\.id=(\d+|[\w.]+).*$/) + { print "$1$2|$3\n"; } else { print "Exception: $_"; } }

    Output on your data:

    123|1|456464|645646|4546|112.33.44.55.66|789137136770 123|1|456464|645646|4546|112.33.44.55.67|789134713670 123|1|456464|645646|4546|112.33.44.55.68|789137213670 123|1|456464|645646|4546|112.33.44.55.69|78913713670 Exception: 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713 +670 123|1|456464|645646|4546|112.33.44.55.70|78913713670 123|1|456464|645646|4546|112.33.44.55.70|78913713670 123|1|456464|645646|4546|112.33.44.55.70|789137135670 123|1|456464|645646|4546|112.33.44.55.70|789137153670 Exception: 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713 +670 Exception: 123|1|456464|645646|4546|654~abc~dhghga~121322~456466874~87 +96896
Re: Print only if pattern matches
by vinoth.ree (Monsignor) on Jan 24, 2013 at 06:42 UTC

    because the lines,

    123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713670 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713670 123|1|456464|645646|4546|654~abc~dhghga~121322~456466874~8796896 123|2|456464|645646|4546|654~abc~dhghga~121322~456466874~6788708 123|2|456464|645646|4546|654~abc~dhghga~121322~456466874~6806
    does not match your regular expression. So the grouping variable $2 and $3 has no values.

Re: Print only if pattern matches
by 2teez (Vicar) on Jan 24, 2013 at 06:52 UTC

    Your long regex

    s/^(.*\|)*.*ABC\.pi=([\d.]+|[\w.]+)*.*ABC\.id=(\d+|[\w.]+).*$/$1$2 +|$3/s;
    can be futher reduced to match your required output.
    Using the solution provided by kenosis like so:
    use strict; use warnings; while (<DATA>) { next unless /^\d+\|(\d+?)\|/ and $1 == 1; if (/(.+?)~.+?=(.+?)~.+=(.+?)$/) { # note here print $1, $2, $3, $/; } else { print "Exception: ", $_, $/; } } __DATA__ 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.66~ABC.id= +789137136770 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.67~ABC.id= +789134713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.68~ABC.id= +789137213670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.69~ABC.id= +78913713670 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +78913713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +78913713670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +789137135670 123|1|456464|645646|4546|654~abc~dhghga~ABC.pi=112.33.44.55.70~ABC.id= +789137153670 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713670 123|1|456464|645646|4546|654~abc~dhghga~121322~456466874~8796896 123|2|456464|645646|4546|654~abc~dhghga~121322~456466874~6788708 123|2|456464|645646|4546|654~abc~dhghga~121322~456466874~6806
    Output:
    123|1|456464|645646|4546|654112.33.44.55.66789137136770 123|1|456464|645646|4546|654112.33.44.55.67789134713670 123|1|456464|645646|4546|654112.33.44.55.68789137213670 123|1|456464|645646|4546|654112.33.44.55.6978913713670 Exception: 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713 +670 123|1|456464|645646|4546|654112.33.44.55.7078913713670 123|1|456464|645646|4546|654112.33.44.55.7078913713670 123|1|456464|645646|4546|654112.33.44.55.70789137135670 123|1|456464|645646|4546|654112.33.44.55.70789137153670 Exception: 123|1|456464|645646|4546|654~abc~dhghga~12.33.44.55.70~3713 +670 Exception: 123|1|456464|645646|4546|654~abc~dhghga~121322~456466874~87 +96896

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Print only if pattern matches
by Athanasius (Archbishop) on Jan 24, 2013 at 07:00 UTC

    The script can be re-written so:

    #! perl use strict; use warnings; while (my $line = <DATA>) { if ($line =~ / ^ \d+ \| (\d+?) \| /x && $1 == 1 && $line =~ s{ ^ (.*\|)* # $1 .*ABC\.pi= ([\d.]+|[\w.]+)* # $2 .*ABC\.id= (\d+|[\w.]+) # $3 .* $ } {$1$2|$3}sx) { print "$1$2|$3\n"; } } __DATA__ ...

    While this “works”, it is dubious: the * quantifier in a regex means match zero or more of the preceeding; in the substitution, do you really want to match zero occurrences of (.*\|) or ([\d.]+|[\w.]+)? If not, use the + quantifier meaning one or more.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Print only if pattern matches
by LanX (Saint) on Jan 24, 2013 at 07:04 UTC
    >  Use of uninitialized value $2 in concatenation (.) or string at split_test.pl li

    > I am looking to get rid off the error. ... ANd I want to write the exceptions to new file.

    so avoid uninitialized '$2'!

    if (defined $2) { print "$1$2|$3\n"; } else { print $exception_fh "$_\n"; }