Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Iterating through an array using multiple loops and removing array elements

by BiochemPhD (Novice)
on Apr 24, 2014 at 03:45 UTC ( #1083508=perlquestion: print w/replies, xml ) Need Help??
BiochemPhD has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks; first time poster, long time lurker (well for about the 4-5 months that I've been scripting in perl).

I have a file that has several thousand entries that are pre-sorted. I read the file into an array, pushing a new element for each entry in my file (the zeroth element being the top-most entry in my file). I want to take that top-most entry and compare it to the rest of the entries in the array, if they meet a certain criteria, I want to remove that entry from the array. Repeat until all entries have been consumed.

My previous attempts have involved an if condition (to remove entries that meet my criteria) nested within a foreach loop (to make the comparisons) nested within a while loop (to iterate through the array for entries that remain). It's a holy mess and I can't figure out way to do it that makes sense and doesn't skip over entries.

I've removed the print functions from the code to make it easier to read.

while (@entries){ my $top_entry = shift @entries; my $counter = 0; foreach my $entry_to_compare (@entries){ my $comparison = compare_sub ($top_entry, $entry_to_compare); if ($comparison <= $user_defined_value){ splice (@entries, $counter, 1); } else{ $counter++; } } }
  • Comment on Iterating through an array using multiple loops and removing array elements
  • Download Code

Replies are listed 'Best First'.
Re: Iterating through an array using multiple loops and removing array elements
by frozenwithjoy (Priest) on Apr 24, 2014 at 04:20 UTC

    I think the splice thing won't work unless you are going backwards through the array (or something) or using first_index from List::MoreUtils to choose what to splice, since removing entries will mess up the significance of the counter value. Why not keep it simple and just push on to an array of 'values to keep' rather than removing values you don't want?

    Totally untested example:

    my @entries = ...; my $top_entry = shift @entries; my @keepers; for (@entries) { my $comparison = compare_sub( $top_entry, $_ ); if ( $comparison > $user_defined_value ) { push @keepers, $_; } }
      I've thought about that but the problem is that after I take the zeroth element and compare it to everything else in the array (removing both the zeroth element and any that meet the comparison criteria), I want to then move onto the next remaning element and repeat the process over and over again until the entire list has been consumed.

        What about making a subroutine for the iterative comparison and have it recursively call itself on the kept values from the array unless some condition is true?

        iterate(@entries); sub iterate { my ( $top_entry, @entries ) = @_; my @keepers; for (@entries) { my $comparison = compare_sub( $top_entry, $_ ); if ( $comparison > $user_defined_value ) { push @keepers, $_; } } iterate(@keepers) unless ...; }

        EDIT: A variation where you continue iterating or assign final result depending on some condition. It's hard to know the right approach from here w/o more info

        my @final_result; iterate(@entries); sub iterate { my ( $top_entry, @entries ) = @_; my @keepers; for (@entries) { my $comparison = compare_sub( $top_entry, $_ ); if ( $comparison > $user_defined_value ) { push @keepers, $_; } } if (...) { iterate(@keepers); } else { @final_result = ...; } }
Re: Iterating through an array using multiple loops and removing array elements
by Tanktalus (Canon) on Apr 24, 2014 at 04:44 UTC

    I have to admit, I don't get the code. As in, I don't really get what you're trying to accomplish.

    But some warning flags do erupt.

    The first one is that when you splice an entry out, you keep checking if you get more $comparisons that are less than or equal to whatever value you're checking against. Are you intending on removing all the entries that are lower than the current one? If so, you probably intend to use grep:

    @entries = grep { my $comparison = compare_sub($top_entry, $_); $comparison > $user_defined_value; # inverse - we want to keep ones +that match } @entries;
    The next flag is that it doesn't look like you do anything. A compare_sub wouldn't, at least in my mind, do anything. It just compares. And your loop doesn't do anything else. I'm not sure if there's supposed to be a do_something($???) in there somewhere. But if you're just consuming everything without doing anything, a simple @entries = (); might be faster.

    The other thing that comparison makes me think of is that you're sorting things somehow. In which case I'd recommend against your merge sort (you'd have to do a binary search to keep it fast), and skip straight to using sort using your compare_sub to order things. The fact you mention that the file is pre-sorted also indicates to me that this is important in some way, so I have to wonder if you're trying to keep it - but if so, there are huge pieces missing from your sample code with regards to sorting anything. Such as pushing the entries on to an output stack. Maybe that's the "print functions" that you removed? But, if so, you'd have to indicate where the print statement goes, and which element you're printing.

    So, really, everything here is just a guess. More details might be required unless someone else can glean more from this.

      My apologies. I realize there's a lot of context missing here. It was my intention to simplify the code for readability, rather than to encumber everyone with other aspects of the code that are not relevant. The script is designed to read entries from an input file that are sorted by decreasing abundance, take the top-most (most abundant) entry and group everything else in the file that is similar (but less abundant). It should then move onto the next most-abundant entry that hasn't been grouped and search the remaining ungrouped entries for similarity and group them together. This repeats until the array of entries is exhausted.

      I want to KEEP entries that don't meet the criteria and I want to REMOVE the ones that do, and then repeat until there's nothing left.

      When I call on compare_sub it returns a value, if the value is less than or equal to a user defined value, then the entry gets printed to output and spliced out of the array. (I removed the print function from the code).

      I'm still quite new at this! Thanks for the help and I hope that clarified things a bit.

Re: Iterating through an array using multiple loops and removing array elements
by kcott (Chancellor) on Apr 24, 2014 at 09:39 UTC

    G'day BiochemPhD,

    Welcome to the monastery.

    Firstly, modifying an array while you're looping through it (with for[each]) will cause problems. The documentation is quite clear on this. From "perlsyn - Foreach Loops":

    "If any part of LIST is an array, foreach will get very confused if you add or remove elements within the loop body, for example with splice. So don't do that."

    From your various posts in this thread, I think this is fairly close to what you want:

    #!/usr/bin/env perl -l use strict; use warnings; use List::Util qw{first}; my $min = 3; my @records = reverse(0 .. 10, 13, 17, 42); print "All records: @records"; my %deleted; for my $i (0 .. $#records) { my $top_index = first { ! $deleted{$_} } $i .. $#records; last unless defined $top_index; my $top = $records[$top_index]; my @group = ($top); for ($top_index + 1 .. $#records) { next if $deleted{$_}; if (compare_sub($top, $records[$_]) <= $min) { push @group, $records[$_]; ++$deleted{$_}; } else { last; } } ++$deleted{$top_index}; print "Group: @group"; } sub compare_sub { my ($x, $y) = @_; return abs($x - $y); }


    All records: 42 17 13 10 9 8 7 6 5 4 3 2 1 0 Group: 42 Group: 17 Group: 13 10 Group: 9 8 7 6 Group: 5 4 3 2 Group: 1 0

    Obviously, I've had to dummy up input data and the compare_sub() routine; however, this does seem to match your (rather vague) description of "abundance".

    As this solution doesn't actually modify @records at all, you may find some benefit in using the builtin module Tie::File: it doesn't load the file into memory (so that may be useful depending on the record size of your thousands of records) and it's less coding (than what I can only guess you're currently doing).

    I appreciate this is your first post (and, to be honest, it's a lot better than many first posts). Please just note the various difficulties monks had and keep those in mind whenever you post next.

    -- Ken

      Modifying an array while looping with foreach = not good.

      What about while looping though it with while? Similarly, what about modifying a hash (using delete) while looping with for/foreach or while? From a quick skim through the doc it doesn't appear to be an issue.

      I'll definitely keep in mind the shortcomings of this post in the future! Context is especially important when TIMTOWTDI!

      Thanks for the help!

        "What about while looping though it with while?"

        for (@array) iterates over the list of values in @array. Changing that list in mid-iteration often has problems. We actually get quite a few questions like "Why doesn't my for loop work?" that are due to such a problem. So, as the doco says, "don't do that".

        while (@array) involves no iteration; it's a simple condition which basically says "Enter the loop if @array has any elements". Unless there's some other method for exiting that loop, you would expect elements to be removed from @array so that the loop can eventually terminate.

        "Similarly, what about modifying a hash (using delete) while looping with for/foreach or while? From a quick skim through the doc it doesn't appear to be an issue."

        If you're writing for (%hash) or while (%hash), that's probably a mistake; perhaps you meant something else. You'll need to provide some code to show the scenario(s) you're considering here.

        By the way, in case you didn't know, for and foreach are synonymous. Save yourself four keystrokes by writing for instead of foreach: the code will run the same whichever you choose.

        -- Ken

Re: Iterating through an array using multiple loops and removing array elements
by Laurent_R (Canon) on Apr 24, 2014 at 06:42 UTC
    If I understood you correctly, you don't need nested loops to do what you want. And you also don't need to store your file in an array in the first place. You can just read your input file, store the data from your first line in an array or a hash; then you go to the next line, if it meets the criteria of what you have already stored, add to the existing hash entry, otherwise create a new hash entry, and so on. In other word, you need only one pass through your file to get all what you need into the hash. At the end, print the hash content or do whatever you need with it.
Re: Iterating through an array using multiple loops and removing array elements
by hdb (Monsignor) on Apr 24, 2014 at 07:36 UTC

    I am not sure what you want to achieve as @entries will be empty after the while loop whatever you do inside... However, the following should do what you want (not tested as I could not think of sample data).

    while( @entries ) { my $top = shift @entries; @entries = map { compare_sub( $top, $_ ) > $user_defined_value ? $ +_ : () } @entries; }

    UPDATE: Reading the whole thread more carefully I realize that my proposal is essential the same (but more convoluted) as Tanktalus' grep above. Pls ignore...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1083508]
Approved by frozenwithjoy
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2018-06-19 05:29 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.