Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Check multiple lines exist in a record

by sundialsvc4 (Abbot)
on Mar 26, 2018 at 20:37 UTC ( #1211784=note: print w/replies, xml ) Need Help??


in reply to Check multiple lines exist in a record

In many real-world cases ... and yours is no exception ... an input file consists of various “lines” which have a recognizable and useful pattern.   Generally, some line marks the beginning of a group of records, another line marks the end, and the rest (which might have a less-recognized pattern) can simply be handled by an “everything else” case.   The awk tool was based on this simple notion, and Perl more-or-less evolved from it.   So, here is a general sketch of a good approach – extemporaneous and untested:

my @lines = []; # # THE MAIN PROGRAM READS THE FILE LINE-BY-LINE, CLASSIFYING THEM # USING REGULAR EXPRESSIONS # while(<>) { if (/^\<SUBBEGIN) { # STARTING LINE @lines = []; } elsif(/^<SUBEND/) { # ENDING LINE look_for_patterns(); } # INSERT "elsif" CASES HERE TO FILTER OUT BLANK LINES # OR OTHER UNWANTED "JUNK," IF APPLICABLE ... # else { # "EVERYTHING ELSE" push @lines, $_ } } look_for_patterns(); # IF APPROPRIATE – MAY NOT BE, FOR THIS CASE # THIS SUBROUTINE EXAMINES EACH ACCUMULATED GROUP OF LINES # LOOKING TO SEE IF BOTH DESIRED PATTERNS ARE INSIDE. sub look_for_patterns { my $CFU_seen = 0; my $CFB_seen = 0; foreach (@lines) { if (/CFU-TS10-ACT/) { $CFU_seen = 1; } elsif (/CFB-TS10-ACT/) { $CFB_seen = 1; } } if ($CFU_seen && $CFB_seen) print "Yay!\n"; }

The first block in this completely extemporaneous code example is a simple, awk-like loop which classifies lines.   The second subroutine then loops through the most-recent set of lines that have been accumulated.   (This subroutine is called each time an ending-record is found, and maybe also(!) at end-of-file.)

For what it’s worth, I no longer use awk to solve such problems, although a comparison between this and an awk solution may be instructive.   (There is, in fact, an a2p tool which directly translates “awk 2 perl.”)   But this is a time-tested and classic approach for dealing with a use-case which is so very common that it inspired the creation of at least two legendary and venerable software tools.

Replies are listed 'Best First'.
Re^2: Check multiple lines exist in a record
by Anonymous Monk on Mar 26, 2018 at 20:52 UTC
    So much garbage, so little testing:
    Unterminated <> operator at crap.pl line 10.
      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1211784]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2019-07-21 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If you were the first to set foot on the Moon, what would be your epigram?






    Results (7 votes). Check out past polls.

    Notices?