Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Perl code help

by usertest (Initiate)
on May 28, 2016 at 18:34 UTC ( [id://1164404]=perlquestion: print w/replies, xml ) Need Help??

usertest has asked for the wisdom of the Perl Monks concerning the following question:

I have a requirement to delete certains lines between two patterns (START and END) which contain a string (def) - for example
Source Data START abc def ghi END START xyz abc END Output should be START xyz abc END

Replies are listed 'Best First'.
Re: Perl code help
by haukex (Archbishop) on May 28, 2016 at 19:12 UTC
Re: Perl code help
by Athanasius (Archbishop) on May 29, 2016 at 04:30 UTC

    Hello usertest,

    As I understand it, you want to retain all lines that do not come between START and END, and also those that do whenever def doesn’t appear in the same block; but when def does occur between START and END, you want to exclude that entire block of data.

    If this is correct, then you need to store all the lines in each START ... END block (e.g., in an array), and print them if and only if it turns out that the block doesn’t contain def. For keeping track of whether you’re currently in a block, the range operator in scalar context is the most appropriate tool:

    #! perl use strict; use warnings; my ($found, @lines); while (<DATA>) { if (my $flag = /START/ .. /END/) { $found = /def/ unless $found; push @lines, $_; if ($flag =~ /E0$/) { if ($found) { $found = 0; } else { print for @lines; } @lines = (); } } else { print; } } __DATA__ Source Data START abc def ghi END START xyz abc END

    Output:

    14:22 >perl 1646_SoPW.pl Source Data START xyz abc END 14:22 >

    For the range operator, see perlop#Range-Operators.

    Update: Fixed logic error in code by moving the @lines = (); statement to below the inner if ... else block; also improved wording slightly.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks Athanasius, this helped a lot. Also, I need one more help. What if the START and END tags are not as simple - it maybe starting with many spaces and instead of START - it will be
      START STEP abc absbsdefdbdb ghi END STEP START STEP xyz abc END STEP
      I need the same result to be generated

        Hi usertest,

        What if the START and END tags are not as simple

        Did you try out the code that Athanasius posted above? Was the output as you expected it or not?

        The regular expressions that Athanasius used will match the words "START" and "END" anywhere in the line, so at least with this sample data you posted it works the same.

        why this statement $flag =~ /E0$/ and how it works

        This regular expression tests for a special value returned by the range operator in scalar context aka flip-flop operator: it marks the end of a range (see the link for all the details). That means that in this case, the condition is true when an "END" is encountered, then the code checks whether it $found a "def" string and acts appropriately, either printing the lines it captured or not.

        Hope this helps,
        -- Hauke D

Re: Perl code help
by Marshall (Canon) on May 31, 2016 at 05:26 UTC
    The classic post on this flip flop subject (now in the tutorial's section) is Flipin good, or a total flop? by Grandfather. Also see the post from ysth in that thread about how to include or exclude the end points (START STOP). I use this every time I write flip/flop code to refresh my recollection.

    Update: I personally like Athanasius's code. But here is another way to do it without the flip/flop operator... Here the "state" of inside record or not is handled by being inside the process_record subroutine or not. At some point, you may find this parsing technique of use...I added some junk to the data and moved the START line from the beginning. If you want the junk to be preserved, maybe these are comments? modify the first while loop. Here I deleted ignored it.

    #!/usr/bin/perl use strict; use warnings; while (my $line = <DATA>) { process_record($line) if $line =~ /START/; } sub process_record { my ($line) = @_; my @lines; my $found; push @lines, $line; #the "START" line while ( $line = <DATA>, $line !~ /END/) { $found = 1 if $line =~ /def/; push @lines, $line; } push @lines, $line; #the "END" line print @lines unless $found; return; } =prints START xyz abc END =cut __DATA__ START abc def ghi END junk more junk START xyz abc END

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1164404]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-19 20:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found