Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Manually incrementing @ array during for

by cniggeler (Sexton)
on Mar 16, 2020 at 15:43 UTC ( [id://11114346]=perlquestion: print w/replies, xml ) Need Help??

cniggeler has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a text file that I'm parsing. The file contents are in an @array, and I use a loop "for (@array) {" to get each line, using $_ as that line's contents.

The start of a line may contain a keyword and I take different steps based on those keywords. However, sometimes the text extends over multiple lines, and in those cases I think it would be useful to get the next line without leaving the outermost for loop. Maybe an example of the data would help:
keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1
In the above example, I'd like to parse keyword2 as if it were all on a single line. I do NOT want to do a "next" on the topmost for loop since I would lose the fact I'm parsing a keyword2 section.

"shift @array" gets me the next line in @array, BUT it throws away the first line in @array, which I don't want to do.

Suggestions are appreciated! Also, I can't think of the correct / succinct terminology other than "manually increment @array inside a for loop" so apologies if there's an answer already in here somewhere ;-)

Replies are listed 'Best First'.
Re: Manually incrementing @ array during for
by hippo (Bishop) on Mar 16, 2020 at 16:05 UTC

    Although I'm not 100% clear on what you want maybe the answer for you is to use the array index:

    #!/usr/bin/env perl use strict; use warnings; my @array = (<DATA>); my $i = 0; while ($i <= $#array) { print "Line $i is $array[$i++]"; while ($i <= $#array && $array[$i] =~ /^ /) { print "Line $i is a continuation: $array[$i++]"; } } __DATA__ keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1
      Thank you for the prompt reply. I was kinda hoping perl had a way to increment in the array without a shift. But changing to a while (or maybe a classic C for loop) using an index, and your reply contains a nice template!

        Generally, languages that provide implicit iteration over aggregates do not provide a way to control the implicit iterator. You will need to either preprocess the @array or use an explicit index variable.

Re: Manually incrementing @ array during for
by tybalt89 (Monsignor) on Mar 16, 2020 at 17:39 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11114346 use warnings; my @array = <DATA>; use Data::Dump 'dd'; dd \@array; my @combinedarray = (join "", @array, '') =~ /^.*\n(?: .*\n)*/gm; dd \@combinedarray; # do the 'for' over @combinedarray __DATA__ keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1

    Outputs:

    [ "keyword1 data1 data2 data3\n", "keyword2 data1 data2 data3\n", " data4 data5\n", " data6\n", "keyword1 data1 data2 data3 data4\n", "keyword3 data1\n", ] [ "keyword1 data1 data2 data3\n", "keyword2 data1 data2 data3\n data4 data5\n data6\n", "keyword1 data1 data2 data3 data4\n", "keyword3 data1\n", ]
Re: Manually incrementing @ array during for
by Fletch (Bishop) on Mar 16, 2020 at 16:09 UTC

    You need to get fancier in your parsing. You need to examine each line as it comes in, determine if it's a continuation (presuming leading whitespace indicates this, going from your example data) and (if not) append to the "current line". Once you're sure you have a full line, then process it and clear out the current line. Handwavy, vague outline:

    my $current_line = q{}; while( defined( my $line = <> ) ) { chomp( $line ); if( $line =~ m{^ \s+ \w+ }x ) { $current_line .= $line; next; } else { _process_line( $current_line ); $current_line = $line; } } if( $current_line ) { _process_line( $current_line ); } sub _process_line { my( $line ) = $shift; ## do whatever . . . }

    Update: Fuller example with sample data and fixing a bugglet first time through loop.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Thank you for your prompt reply. I agree the steps necessary require more sophisticated parsing. I like the approach where a split data line is joined, then just process the entire line.

      However, this is part of a much bigger body of code, and I receive the already created @array, so I don't think I can shift since the @array may be used elsewhere.

      That being the case, do you think I should use an array index instead? Sort of a combination of yours and the previous reply. I could peek ahead to the next line and if it starts with blanks (or is not a keyword), I could then combine the next line to the current line as per your approach... Unless you say otherwise, I think I will go this route. Thanks again!

        Several other monks have been suggesting ways to handle this that end up modifying @array. If your part of the program receives a reference to @array, you can use the dclone method from the core module Storable to copy the array and then mangle the copy however you want.

        Adapted from the Storable POD:

        use Storable qw(dclone); # ... my $arrayref = dclone($provided_arrayref);

        As long as @array is small enough to copy, this should be very efficient; Storable is an XS module.

        Very similar approach in that case, just instead of reading lines from the file you walk the indexen instead. The concatenation and processing of entries is similar otherwise (my sample changes only 4 lines).

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

Re: Manually incrementing @ array during for
by johngg (Canon) on Mar 16, 2020 at 16:48 UTC

    It might be simpler to do a first pass concatenating continuation lines via splice before your main processing.

    johngg@shiraz:~/perl/Monks$ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<__EOD__ or die $!; keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1 __EOD__ my @dataLines = <$inFH>; chomp @dataLines; close $inFH or die $!; for my $idx ( reverse 0 .. $#dataLines ) { next if $dataLines[ $idx ] =~ m{^keyword}; $dataLines[ $idx - 1 ] .= splice @dataLines, $idx, 1; } say for @dataLines;' keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1

    I hope this is of interest.

    Cheers,

    JohnGG

Re: Manually incrementing @ array during for
by Marshall (Canon) on Mar 17, 2020 at 01:22 UTC
    I demo a common parsing pattern below. You figure out what is special about the start of a "new record". If you see that "special thing" and you are already working on a record, then you process the previous record and start a new one. Otherwise you are continuing the current record. Note that since the start of a new record triggers the output of the previous record, there is a need to output the final record once the data ends.

    use strict; use warnings; $|=1; my $data_lines = ('keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1 '); my @lines = split (/\n/,$data_lines); print "To show array of text lines as per spec:\n"; foreach (@lines) { print " $_\n"; } print "\n"; print "Showing data array's per combined input lines:\n\n"; my @array = (); foreach my $line (@lines) { if ($line =~ /^\S/ and @array>0) # Finish previous record { process_array (@array); @array = (); #start new record push (@array,$_) foreach (split ' ',$line); } else # new or continuing record { push (@array, $_) foreach (split ' ',$line); } } process_array (@array); # the last record sub process_array { my @array = @_; print "process array in some sub = @array\n"; } __END__ To show array of text lines as per spec: keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1 Showing data array's per combined input lines: process array in some sub = keyword1 data1 data2 data3 process array in some sub = keyword2 data1 data2 data3 data4 data5 dat +a6 process array in some sub = keyword1 data1 data2 data3 data4 process array in some sub = keyword3 data1
Re: Manually incrementing @ array during for
by kcott (Archbishop) on Mar 17, 2020 at 08:25 UTC

    G'day cniggeler,

    From your description, you're reading lines from a file and adding them to an array, then reading all the same lines from the array and processing them. You're doing the same work twice and you've provided no explanation why you need to do this. Is there a reason you're not just processing the lines as you read them from the file?

    If your keyword lines all start the same -- e.g. "ID1", "ID2", etc. -- you can do something like this:

    #!/usr/bin/env perl use strict; use warnings; { local $/ = 'keyword'; while (<DATA>) { next if $. == 1; chomp; y/\n//d; print "$/$_\n"; } } __DATA__ keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1

    You may need to refer to local and, for $. and $/, "perlvar: Variables related to filehandles".

    If the only way to differentiate keyword lines from continuation lines is by whitespace, you can do something like this:

    #!/usr/bin/env perl use strict; use warnings; my $multiline = ''; while (<DATA>) { chomp; if (0 == index $_, ' ') { $multiline .= $_; } else { print "$multiline\n" if length $multiline; $multiline = $_; } } print "$multiline\n"; __DATA__ keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1

    Both of those scripts produce identical output:

    keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1

    If there's something else going on here, you'll need to tell us. For instance, keyword may have some associated pattern, in which case a regex solution might be more appropriate.

    Please include some code with any follow-up questions; along with output, even if that's only error messages.

    — Ken

      Is there a reason you're not just processing the lines as you read them from the file?

      According to this, the code for which cniggeler is responsible "... is part of a much bigger body of code, and I receive the already created @array, ... the @array may be used elsewhere." Then IIUC, it's not possible for cniggeler to parse the data at the point of access (which I agree would likely be simpler and more efficient).


      Give a man a fish:  <%-{-{-{-<

        Fair enough. I obviously missed the later post where the goal posts were moved. :-)

        The second of my two solutions would work equally well for an array. The chomp may not be necessary: hard to tell as the example input data presented in the OP looks more file data than array data.

        — Ken

Re: Manually incrementing @ array during for
by cniggeler (Sexton) on Mar 18, 2020 at 02:20 UTC
    Thanks for the many great replies! I ended up using a while loop and index to step through the @array, and coalescing multiple lines into one line, which was then parsed. The index had to be incremented for each "extra" line so the outer while loop didn't re-process the coalesced lines.

      Although you already have a working solution, I took this as a puzzle where the array

      • has been filled before
      • must not be shifted or modified otherwise
      • shall be processed in a for-loop without a next statement
      Here is my (not so serious) solution:

      #!/usr/bin/perl use strict; use warnings; my @data = <DATA>; for ((my $i, local $_, my $next) = (0, @data[0, 1]); $i < @data; ($_, $next) = ($next, $data[++$i + 1])) { $next and $next =~ /^\s/ and ($_, $next) = ($_ . $next, $data[++$i + + 1]) and redo; # processing goes here print "#$i: $_"; } __DATA__ keyword1 data1 data2 data3 keyword2 data1 data2 data3 data4 data5 data6 keyword1 data1 data2 data3 data4 keyword3 data1
      which gives:
      #0: keyword1 data1 data2 data3 #3: keyword2 data1 data2 data3 data4 data5 data6 #4: keyword1 data1 data2 data3 data4 #5: keyword3 data1

      Greetings,
      -jo

      $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11114346]
Approved by marto
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-19 09:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found