msh210 has asked for the wisdom of the Perl Monks concerning the following question:

O Monks,

I have a script that looks essentially like this:

while(@array1) { ONE: while(@array1) { my $item = shift @array1; # Do various things with $item, some of which will _possibly_ # push @array1, ... # and/or # push @array2, ... } TWO: while(@array2) { my $item = shift @array2; # Do various things with $item, some of which will _possibly_ # push @array2, ... # and/or # push @array1, ... } }

The idea here is that I'm acting on items, which includes collecting more items to act on. The problem is that ONE runs for a long, long time (a few days, so far), and I don't want to spend so much time on @array1 and neglect @array2: I want to get some of the "various things" done to @array2 also. So I'm looking for ideas on how to get my code to switch off to TWO even before ONE ends.

One thing I thought of is to use an incrementer, something like this:

while(@array1 or @array2) { my $i = 0; ONE: while(@array1 and $i < $some_large_number/10) { $i++; my $item = shift @array1; # Do various things (as above) } $i = 0; TWO: while(@array2 and $i < $some_large_number) { $i++; my $item = shift @array2; # Do various things (as above) } }

Another idea I had was to use threading: handle ONE and TWO in separate threads, and have each wait for the other until both arrays are empty. The problem with using threading is twofold: I don't know how to use threading (though I can learn, of course), and it comes with a big nasty warning.

A third idea I had was to combine the two arrays, something like this:

while(@array) { my $item = shift @array; if (used_to_be_in_array1($item)) { # Do various array1ish things with $item, some of which will _ +possibly_: push @array, ... } elsif (used_to_be_in_array2($item)) { # Do various array2ish things with $item, some of which will _ +possibly_: push @array, ... } else { warn $item; } }

But I may be missing some pros and cons of each of those approaches — and of course there is probably yet another and better way that I haven't thought of. So I seek your advice on this issue. What do you recommend (and why) for handling both my arrays instead of spending a lot of time on one of them?

$_="msh210";$"=$\;@_=@{[split//,uc]}[2,0];$_="@_$\1";$\=$/;++$_[0]for$...1;print lc substr crypt($_,"@_"),1,6

Replies are listed 'Best First'.
Re: Switching back and forth between parts of my script
by Laurent_R (Canon) on Mar 04, 2016 at 18:06 UTC
    Maybe you could just alternate between the two arrays:
    while (1) { if (@array1) { #.... } if (@array2) { #.... } }
Re: Switching back and forth between parts of my script
by Tanktalus (Canon) on Mar 04, 2016 at 19:48 UTC

    Threading seems to be the right idea here. It appears you have one input list, and some subset of them get pushed into the second array, and then some subset of those get pushed back into the first array, as well as some subsets of each being pushed back into its own array.

    What this looks like to me is a queue. Put the list into the first queue, which is addressed by the first thread, and it can manage the entries, including possibly pushing into one of the two queues. And the second thread waits on the second queue, and pulls stuff off as it goes, enqueuing anything required. Thread::Queue should take care of most of this for you, as long as you don't have any other overlap in your variables.

    You may get a bit simpler (there's nothing here that looks to gain from this option, but you've also omitted most of the code, so I can't be sure) by using Coro and Coro::Channel in similar ways, but this only really benefits you if your two threads are doing any significant amount of waiting on external events, such as downloading via HTTP. If you're already fully CPU bound but have more than one CPU available, threads are more likely to be more helpful. For those who are afraid of threads, Coro with AnyEvent::Fork::RPC may work - you'd essentially fork off the two worker threads and push data in and out of there, allowing both child processes to use a full CPU each.

    Either one of these options, threads or Coro, may also further benefit from multiple worker threads - depending on how CPU-intensive everything is.

    The best/simplest option, though, might be to simply try to stick to a single array, and figure out as you pull each item off the list what needs to be done with the object - your "third" idea. Having only a single queue just makes things conceptually easier. The problem here is if you need to set queue priorities - you'll need a priority queue instead of a straight list - so that you can deal with higher-priority items sooner. I'm not sure you do, but the concern you have about neglecting array2 indicates a possibility here that array2 is somehow higher priority. If array2 is strictly higher priority, then deal with it that way:

    while(@array) { while (@array2) { my $item = shift @array2; my $result = do_stuff2($item); if ($result == 1) { push @array, $item } elsif ($result == 2) { push @array2, $item } # else done with item, discard. } my $item = shift @array; my $result = do_stuff($item); if ($result == 1) { push @array, $item } elsif ($result == 2) { push @array2, $item } # else done with item, discard. }
    This will go back and clear @array2 after handling each item in @array1. It's kind of like a priority queue, but simplified for the case of two strict priority levels.

Re: Switching back and forth between parts of my script
by jdporter (Canon) on Mar 04, 2016 at 21:58 UTC

    Have a look at this thread from a while back: Handling Multi-Priority Requests

    I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.

      Thank you!

      $_="msh210";$"=$\;@_=@{[split//,uc]}[2,0];$_="@_$\1";$\=$/;++$_[0]for$...1;print lc substr crypt($_,"@_"),1,6
Re: Switching back and forth between parts of my script
by perlfan (Vicar) on Mar 04, 2016 at 18:39 UTC
    Weight it by chance - say, 75% chance it'll choose an element from @array1, 25% chance it'll choose from @array2. Adjust the % chance based on your needs.
Re: Switching back and forth between parts of my script
by Laurent_R (Canon) on Mar 05, 2016 at 08:56 UTC
    You don't say enough about your process, I think that you should say especially why ONE runs for a long, long time (a few days, so far). What is it doing that takes so long?

    You may benefit from threading if one of the processes is spending significant time waiting from some external events (disk IO, network input, etc.) or if such process is doing really CPU intensive computation such as heavy number crunching (say, things like prime factor decomposition of very large integers). You did not give an indication that this is part of what you are doing, so that there is little evidence at this point that threading will improve your process.

      Good point, thanks.

      $_="msh210";$"=$\;@_=@{[split//,uc]}[2,0];$_="@_$\1";$\=$/;++$_[0]for$...1;print lc substr crypt($_,"@_"),1,6
Re: Switching back and forth between parts of my script
by msh210 (Monk) on Mar 04, 2016 at 21:08 UTC

    Thank you all (Laurent R, perlfan, Tanktalus) for your ideas! Tanktalus, I think you misunderstood my arrays' schema, but no matter: I think I will end up using something based directly on one of your ideas:

    while(@array1 or @array2) { while(@array2) { my $item = shift @array2; # Do various things } if(@array1) { my $item = shift @array1; # Do various things } }

    This avoids the need for me to learn to use threads or Coro. And (it's especially good for me at this juncture. My script has (as I mentioned) been running for days, building up `@array2` without doing anything with it really, but I stored the array elements in a file, so now I can read that file and start acting on it immediately. Even after my current list of `@array2` items is exhausted,) I'm more interested really in `@array2` items than in `@array1` items, so this is a good solution for me. Thanks again.

    $_="msh210";$"=$\;@_=@{[split//,uc]}[2,0];$_="@_$\1";$\=$/;++$_[0]for$...1;print lc substr crypt($_,"@_"),1,6
      Cool. The only thing I was addressing in my solution of % chance is the concept of process "fairness;" clearly it doesn't address priority.