AppleFritter has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, esteemed monks! Allow this humble pony to drink the sweet nectar of knowledge from the font of your collective wisdom. (Or alternatively, how 'bout some hard cider?)

I need to read a number of files. In each file, each line holds a piece of data, or a marker indicating the beginning or end of a section; I'm interested only in data in a specific section. Normally, I'd do something like this:

foreach my $HANDLE (@HANDLES) { while(<$HANDLE>) { chomp; next unless /^PP_START$/ .. /^PP_END$/; # process line } }

However, it turns out that in these log files, the section end marker may be omitted if there is no following section: the end of the file itself indicates the end of the section then.

This wreaks havoc with the above logic, as the flip-flop operator, not having seen the marker, still evaluates to true when the outer loop moves on to the next file, and wrongly causes lines before the start marker in that file to be processed.

Of course it would be trivial to add a flag indicating whether I'm in the right section, and reset that for each file. But doing that would essentially manually emulate the flip-flop operator, which strikes me as less than elegant. So I'm wondering -- is there a way to "reset" the flip-flop operator, as it were, so that it starts returning false again at the beginning of each new file?

(I know that working sample code/data is appreciated. Give me a moment and I'll cook something up.) Here's a sample script:

#!/usr/bin/perl use Modern::Perl '2014'; my @HANDLES = map { open my $HANDLE, "<", $_ or die "Could not open $_: $!\n"; $HANDLE; } @ARGV; foreach my $HANDLE (@HANDLES) { while(<$HANDLE>) { chomp; next unless /^PP_START$/ .. /^PP_END$/; say; } }

And two sample files (say log1.txt and log2.txt):

uninteresting #1 uninteresting #2 uninteresting #3 TX_START uninteresting #4 uninteresting #5 TX_END PP_START interesting #1 interesting #2

And:

uninteresting #1 uninteresting #2 uninteresting #3 TX_START uninteresting #4 TX_END PP_START interesting #1 interesting #2 interesting #3 PP_END uninteresting #5 uninteresting #6 uninteresting #7

If you pass these in in this order, you'll get:

PP_START interesting #1 interesting #2 uninteresting #1 uninteresting #2 uninteresting #3 TX_START uninteresting #4 TX_END PP_START interesting #1 interesting #2 interesting #3 PP_END

And as you can see, the uninteresting lines from before the PP section in the second file get included in the output.

Replies are listed 'Best First'.
Re: Resetting a flip-flop operator
by choroba (Archbishop) on Aug 06, 2015 at 11:08 UTC
    Flip-flop is magical, but not that much. You can use boolean expressions in it, so just change the right one:
    next unless /^PP_START$/ .. (/^PP_END$/ || eof $HANDLE);
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Ah! Yes, that works. Thanks!

      For the record and for the benefit of anyone else wondering the same thing, I've also tried experimenting with scoping, e.g. by moving the inner loop into a sub of its own. Turns out flip-flops maintain their state even across subroutine calls (which is likely what you'd expect anyway).

        Turns out flip-flops maintain their state even across subroutine calls
        Not if closures are involved:
        sub mk_flipflop { my ($start, $end) = @_; return sub { /$start/../$end/ } } for my $fh (@fhs) { my $ff = mk_flipflop(qr/^PP_START$/, qr/^PP_END$/); while (<$fh>) { $ff->() or next; # process line } }