Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Deleting records from an array

by viffer (Beadle)
on Dec 22, 2014 at 05:34 UTC ( #1110995=perlquestion: print w/replies, xml ) Need Help??

viffer has asked for the wisdom of the Perl Monks concerning the following question:

Morning all

I know this isn't something that is normally done, but I seriously can't think of any other way to process the file I'm looking at

I have a hash containing a number of keys, and an array that contains a number of records.

For each hash key I'm processing all the records in the array, and printing off the records in the array that have a field that matches the hash key.
The problem I have, is that for each hash key I process, I'm reading the entire array EVERY time. Once I've printed the records in the array I want to delete them so they they don't get processed every single time.

For 40,000 records the process is taking in excess of 3 hours.

I think I can delete the record from the array if I know what the index of the record is, but I'm not sure how to work that out

for my $key (keys %coll_key_hash) { foreach my $rec (@recs_read) { my @fields = split(/\|/, $rec); if ($fields[0] eq '010') { if ($fields[3] =~ $key) { $print_record = 1; if ($key ne $prev_key) { $prev_key = $key; print_record ("CUST-BEG|$seq\n",' '); } print_record ("$rec\n",'c'); } else { $print_record = 0; } } else { if ($print_record) { print_record ("$rec\n",'c'); } } } }
What I'm trying to do is to find a way to delete the record from the array when I've processed it, so that I'm not processing a record I've already processed once before.

Any suggestions gratefully received. Thanks - and Merry Xmas to all

Replies are listed 'Best First'.
Re: Deleting records from an array
by Anonymous Monk on Dec 22, 2014 at 08:34 UTC
    You can delete elements from an array using splice. But it's not a good idea to do that in a loop (very error-prone). Deleting elements from the middle of a big array is not a fast operation anyway. I'd rather replace processed elements with undef. Like that:
    use strict; use warnings; use Data::Dumper; my %hash = ( a => 1, n => 1, z => 1, ); my @ary = qw( a b n m y z ); for my $key ( keys %hash ) { for my $elem (@ary) { next if not defined $elem; if ( $elem =~ $key ) { $elem = undef; } } } print Dumper \@ary;
    output:
    $VAR1 = [ undef, 'b', undef, 'm', 'y', undef ];
    It works because $elem in this kind of loop is magical... it's an 'alias' (so I guess a pointer) to the real elements of array.
      ...if that's still not fast enough, you should think of a better approach, such as grouping your records as you read them (from a file?) in another hash (group by fourth field?).
        ...another optimization would be to use index instead of regex, if your keys are simple literals and not actually regular expressions...
Re: Deleting records from an array
by igoryonya (Pilgrim) on Dec 22, 2014 at 12:29 UTC

    Use splice, although, some might not recommend it. I use it extensively, when working with arrays, since, it greatly simplifies and shortens your code. Also, it's very efficient.

    Your code:
    for my $key (keys %coll_key_hash) { foreach my $rec (@recs_read) { my @fields = split(/\|/, $rec); if ($fields[0] eq '010') { if ($fields[3] =~ $key) { $print_record = 1; if ($key ne $prev_key) { $prev_key = $key; print_record ("CUST-BEG|$seq\n",' '); } print_record ("$rec\n",'c'); } else { $print_record = 0; } } else { if ($print_record) { print_record ("$rec\n",'c'); } } } }
    First, I would like to simplify it, in order to work with it easier:
    for my $key (keys %coll_key_hash){ for my $rec (@recs_read){ my @fields = split(/\|/, $rec); if($fields[0] eq '010'){ $print_record = 0; if($fields[3] =~ $key){ #shouldn't this be /$key/ ? $print_record = 1; unless($key eq $prev_key){ $prev_key = $key; print_record ("CUST-BEG|$seq\n",' '); } } } if($print_record){ print_record ("$rec\n",'c'); } } }
    The splice algorhythm to delete an array record and to check for loop iteration consistence:
    for my $key (keys %coll_key_hash){ for(my $recIdx = 0; $recIdx < scalar @recs_read; $recIdx++){ #check, if redo overflowed the end of an array: next if(($recIdx + 1) > scalar @rec_read); my $rec = $recs_read[$recIdx]; my @fields = split(/\|/, $rec); if($fields[0] eq '010'){ $print_record = 0; if($fields[3] =~ $key){ #shouldn't this be /$key/ ? $print_record = 1; unless($key eq $prev_key){ $prev_key = $key; print_record ("CUST-BEG|$seq\n",' '); } } } if($print_record){ print_record ("$rec\n",'c'); #Delete one record at the specified index: splice(@recs_read, $recIdx, 1); #Since the record at the current index is deleted, the new + record shifted to the same index, so we test it again: redo; } } }
      Also, it's very efficient.
      I was curious about that and decided to test it. Splice is more efficient then I thought but it's still not as fast as undef with big arrays. So with arrays of 1000 element they're equal:
      x $ perl cmp.pl -t 1000 -s 1000 Rate splice undef splice 287/s -- -3% undef 296/s 3% --
      10000 elements and undef becomes a bit faster:
      $ perl cmp.pl -t 1000 -s 10000 Rate splice undef splice 19.4/s -- -21% undef 24.7/s 27% --
      40000 elements and undef significantly faster:
      $ perl cmp.pl -t 100 -s 40000 Rate splice undef splice 2.41/s -- -43% undef 4.24/s 76% --
      Even with the most favorable conditions for splice they're about equal:
      $ perl cmp.pl -t 1000000 -s 10 Rate splice undef splice 18678/s -- -7% undef 20190/s 8% --
Re: Deleting records from an array
by Lotus1 (Vicar) on Dec 22, 2014 at 14:20 UTC

    You could avoid the loop within a loop issue if you would loop through the array one time and lookup the hash elements as you go. Instead of doing

    if ($fields[3] =~ $key) {
    you can do
    if( defined( $coll_key_hash{$fields[3]} ){
    to check if the array element is in the hash.

    You didn't provide an example of your data but from your code it looks like it would work this way. Even with the approach of deleting array elements as they are used you would be rehashing (har har) a lot of data.

Re: Deleting records from an array
by Anonymous Monk on Dec 22, 2014 at 21:29 UTC

    You've already got some answers, but this caught my eye:

    I seriously can't think of any other way to process the file I'm looking at ... I have ... an array that contains a number of records

    Does the array consist of the lines of the file? If so, perhaps there is a solution to this in which you could process the file line-by-line without having to read the whole thing into memory...

    Also, this sounds like the kind of thing that a database might be able to solve natively?

    Maybe you could show us some sample data - a few lines of the input file and/or a few of your records.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1110995]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2022-05-29 09:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (101 votes). Check out past polls.

    Notices?