Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: foreach argument modification inside loop, followed by return from loop

by rjt (Deacon)
on Jul 10, 2013 at 13:04 UTC ( #1043462=note: print w/ replies, xml ) Need Help??


in reply to Re^2: foreach argument modification inside loop, followed by return from loop
in thread foreach argument modification inside loop, followed by return from loop

that was just proof-of-concept code

Understood, and thanks for posting your real code farther down the thread. Based on that, the relevant logic you have is this:

my $idx = 0; for my $j (@a) { if ($j == $target) { splice(@a, $idx, 1); last; } ++$idx; }

Which helps clarify what you're trying to do: remove an element from an array by value. I'm still far from convinced that there is any benefit to a splice/last approach, even based solely on the grounds that it's slower and more complex than something like this:

my $idx = firstidx { $_ == $target } @a; splice(@a, $idx, 1) if $idx >= 0;

Timing for 1e6 entries:

Rate for firstidx for 24.3/s -- -22% firstidx 31.1/s 28% --

Certainly in this case, the for approach seems clearly suboptimal (List::MoreUtils' XS code does essentially the same thing as the for loop, much faster and more cleanly). Results were similar with any array sizes where efficiency is likely to matter.

#!/usr/bin/env perl use 5.012; use warnings; use Benchmark qw/cmpthese/; use List::Util qw/shuffle/; use List::MoreUtils qw/firstidx/; my @for = 1..1e6; my @firstidx = 1..1e6; my @remove_for = shuffle @for; my @remove_firstidx = @remove_for; # Same search order # Use iteration count, not time, so we can verify results. cmpthese(100, { for => sub { my $target = pop @remove_for; my $idx = 0; for my $j (@for) { if ($j == $target) { splice(@for, $idx, 1); last; } ++$idx; } }, firstidx => sub { my $target = pop @remove_firstidx; my $idx = firstidx { $_ == $target } @firstidx; splice(@firstidx, $idx, 1) if $idx >= 0; }, }); # Sanity check: arrays are the same die "Array length mismatch" if @for != @firstidx; die "Content different" if grep { $for[$_] != $firstidx[$_] } 0..$#for +;


Comment on Re^3: foreach argument modification inside loop, followed by return from loop
Select or Download Code
Re^4: foreach argument modification inside loop, followed by return from loop
by vsespb (Hermit) on Jul 10, 2013 at 13:29 UTC
    I'm still far from convinced that there is any benefit to a splice/last approach, even based solely on the grounds that it's slower and more complex than something like this:
    I agree. This can be optimized-out.
    Problem that I mentioned in my posting Re^2: foreach argument modification inside loop, followed by return from loop that "for" loop is in get_task(). i.e. there is another outer "for" loop.

      I see that, however from what I can see that whole file could be easily refactored to not only avoid deleting within the (outer) for loop, but be significantly more concise as well. Here's a grossly oversimplified version of your code logic (at least the parts pertinent to this discussion):

      OUTER: for (1) { for my $job (@jobs) { my ($status, $task) = $job->{job}->get_task(); if ($status eq 'wait') { if ($self->{one_by_one}) { return 'wait'; } else { return 'wait' unless --$maxcnt; } } elsif ($status eq 'done') { ($s, $t) = do_finish($job); # do_finish deletes from @jobs redo OUTER if $s eq 'ok'; return ($s,$t); # (else) } else { my $newtask = Package->new(...); return ($status, $newtask); } } } return('wait')

      You might already get away with the do_finish call, since you either redo outside the for, or exit the sub, but it might not stay that way if your logic changes in the future (it does create a bit of a trap for the unwary).

      As far as I can tell, you only ever look at the first element of the jobs array; all cases eventually lead to a return or redo OUTER, right? If that's the case, you don't need to loop through the jobs at all. And, one more optimization: by extension of the last point, you only ever call do_finish on that first job (in the get_task loop context), right? So you don't need splice at all, just shift.

      NEXT: my $job = shift @jobs; my ($status, $task) = $job->{job}->get_task; return 'wait' if $status eq 'wait'; # All 'wait' roads went there. if ($status eq 'done') { my ($done_s, $done_t) = $self->do_finish($job); goto NEXT if $done_s eq 'ok'; return ($done_s, $done_t); } my $newtask = Package->new(...); return ($status, $newtask);

      I haven't had much time with your code at all, so it's quite possible I missed something (or there are planned additions that would invalidate my approach). Just remember my main goal was to get you thinking about one way you might start to rework the outer code for easier human comprehension, conciseness, not to mention getting rid of any need to flirt with array modification inside a loop.

      And this is just one of many ways. Let me/us know which way you end up going, or if you'd like more input.

        As far as I can tell, you only ever look at the first element of the jobs array;
        No, first job can return 'wait' and loop will jump next job after this line
        return 'wait' unless --$maxcnt;
        not to mention getting rid of any need to flirt with array modification inside a loop
        but it might not stay that way if your logic changes in the future (it does create a bit of a trap for the unwary)

        Well, it's unlikelly going to change the way that loop will continue after do_finish(). My do_finish() sub is something that called before finish.

        Also, let's imagine that I copied my @{$self->{jobs_a}} to temporary variable @tmp_jobs. Will it solve the problem? No. If I change the code so loop continues after do_finish(), it will still introduce the bug (however a bit more visible).

        I think unit tests with 100% branch coverage would be a solution here. (+ effort to keep it 100% after any modifications)

        Let me/us know which way you end up going
        I don't know yet. Certainly, there are ways to make it bit better. But I actually think it's irrelevant to this topic :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1043462]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2014-12-21 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (104 votes), past polls