Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

foreach skipping elements

by tiggyboo (Initiate)
on Jul 19, 2013 at 15:27 UTC ( #1045375=perlquestion: print w/replies, xml ) Need Help??
tiggyboo has asked for the wisdom of the Perl Monks concerning the following question:

I'm parsing a pipe delimited file and working my way through each element with foreach after splitting each line. What I'm finding is that uninterrupted empty fields to the end of the line are being somehow skipped for processing, i.e.:

# input example: 1|2|3|||6|7|||| @fields = split('\|', $_); foreach (@fields) { &process($_); } sub process { print $_ . "\n"; }

In this case the empty elements after "7" will not be printed. However, if I were to fill the last empty element with a value, all the empty elements in between will be printed. Any thoughts? Thanks in advance.

Replies are listed 'Best First'.
Re: foreach skipping elements
by Tux (Abbot) on Jul 19, 2013 at 15:30 UTC

    split takes a regular expression as first argument, not a fixed string (there are exeptions). You need -1 as last argument to get trailing empty fields too.

    @fields = split /\|/, $_, -1

    FWIW, You could also use Text::CSV_XS with sep_char => "|".

    Enjoy, Have FUN! H.Merijn
Re: foreach skipping elements
by Corion (Pope) on Jul 19, 2013 at 15:30 UTC

    That's how split is documented to work. You will likely want to set a negative LIMIT parameter.

Re: foreach skipping elements
by mtmcc (Hermit) on Jul 19, 2013 at 16:07 UTC
    You could stick spaces into those 'empty' bars by swapping with something like this: while($line =~ s/\|\|/\| \|/){}

    eg where $fileName is a text file containing your string:

    #! /usr/bin/perl -w use strict; my $fileName = $ARGV[0]; my @old; my @new; open (my $file, "<", $fileName); while(<$file>) { my $line = $_; while($line =~ s/\|\|/\| \|/){} #print STDERR "OLD: $_\tNEW: $line"; @old = split(/\|/, $_); @new = split(/\|/, $line); } for(@old) { print STDERR "OLD: $_\n"; } print STDERR "\n"; for(@new) { print STDERR "NEW: $_\n"; }

      "You could stick spaces into those 'empty' bars by swapping with something like this: while($line =~ s/\|\|/\| \|/){}"

      Perhaps I'm missing your intent, but that seems like an unnecessarily complicated way to get a repeat substitution when the 'g' modifier is provided for that task:

      $ perl -Mstrict -Mwarnings -E ' my $x = q{1|2|3|||6|7||||}; { my $line = $x; while($line =~ s/\|\|/\| \|/){} say $line; } { my $line = $x; $line =~ s/\|(?=\|)/| /g; say $line; } ' 1|2|3| | |6|7| | | | 1|2|3| | |6|7| | | |

      Using the while loop instead of the 'g' modifier is also slower:

      $ perl -Mstrict -Mwarnings -E ' use Benchmark qw{cmpthese}; my $x = q{1|2|3|||6|7||||}; cmpthese(1e6 => { while_no_repeat => sub { my $line = $x; while($line =~ s/\|\|/\| \|/){} }, just_repeat => sub { my $line = $x; $line =~ s/\|(?=\|)/| /g; } }); ' Rate while_no_repeat just_repeat while_no_repeat 158479/s -- -21% just_repeat 201613/s 27% --

      [Aside: the pipe (|) character is not special in the "replacement" part of s/pattern/replacement/ so there's no need to escape it.]

      -- Ken

        Fair point, thanks for comparing them!


      But your program will miss the last empty field. In 1|2||3|45|||6|, for example, your program stops too early:

      . . NEW: 45 NEW: NEW: NEW: 6

      (One empty NEW: line should follow, since separators separate fields, rather than terminate them.)

      If that were the only issue, you could just append a single '|' to the input string before the rest of your logic.

      The second problem, though, is that your regex converts empty fields to a single space ' '. True, you could easily replace them, but if the field was ' ' in the first place, the value will be destroyed. That may not be a problem depending on the data, but to fix this, and the previous issue, you can make a couple of changes:

      $line =~ s!\|!\| !g; . . @new = map { substr $_, 1 } split /\|/, $line;

      Or, as the OP suggested he was trying to avoid, he could just add a non-blank field to the end. In that case the entire code reduces to:

      my @new = split /\|/, $line . '|sentinel'; $#new--; # Remove sentinel

      But of course, split /\|/, $line, -1; still gets my vote. :-)

        Two good points, thanks for pointing those out!


      Thanks for the speedy suggestions everyone. I'm off an running, able to do some *real* damage now :-) Al

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045375]
Approved by sparkyichi
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (14)
As of 2017-01-19 18:23 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (170 votes). Check out past polls.