http://www.perlmonks.org?node_id=221616


in reply to Finding first block of contiguous elements in an array

To propose a really fitting solution, we need to see some code.

How do you extract the title? Do you read it in a separate run through the file? Do you have a loop that does one of several things depending on what the current line starts with? What else is your code doing? The solution will differ depending on your existing implementation.

I am guessing that: the information is all located in a single file, you're only doing one iteration over it, and all pieces of information follow the format you already showed (ie if broken across multiple lines, the following lines start with the same tag followed by a line number).

In that case, the way I'd handle this is to read the lines batchwise, reconstruct them into a single line, then hand it off to the appropriate handler.

my %handler = ( HEADER => sub { ... }, TITLE => sub { ... }, COMPND => sub { ... }, ); my ($tag, $text) = ("")x2; while(<>) { chomp; my ($curr_tag, $curr_text) = split /\s+/, $_, 2; if($curr_tag ne $prev_tag) { $handler{$tag}->($tag, $text) if exists $handler{$tag}; # complain_about_unknown() if not exists $handler{$tag}; ? ($tag, $text) = ($curr_tag, ""); } else { my $curr_linenr; ($curr_linenr, $curr_text) = split /\s+/, $curr_text, 2; # perform validation on line nr here? } $text .= " " . $curr_text; }
So now we have a parser that lets us write handlers for the tags that don't individually need to worry about multiple line text. And then the distinction is painless:
my %record; my %handler = ( # ... TITLE => sub { $record{TITLE} = $_[1] unless exists $record{TITLE} + }, # ... );
Or if there are multiple records per file:
my $curr_rec = 0; my @record; my %handler = ( # ... HEADER => sub { ++$curr_rec }, TITLE => sub { $record[$curr_rec]->{TITLE} = $_[1] unless exists $record[$curr_rec]->{TITLE} }, # ... );
You get the idea.

Makeshifts last the longest.