Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Deleting internal array elements

by Itatsumaki (Friar)
on Mar 04, 2004 at 06:12 UTC ( #333797=perlquestion: print w/replies, xml ) Need Help??

Itatsumaki has asked for the wisdom of the Perl Monks concerning the following question:

This may be a stupid question, but I didn't find an answer by skimming perldocs and google. My apologies if I missed an obvious answer somewhere.

Is there a built-in function to delete an internal element in an array and have the array automatically closer around it "in-place"? I'm looking for something like:

my @array = (1,2,3); remove($array[1]); print join(',', @array); # prints: 1,3

The reason I ask is that I am parsing a very large (3 GB) CSV file, and only keeping rows whose first element is found in an array. Each element in the array has only one matching row, so I thought it would be nice to just "remove" the elements as I go so that when the array is empty I could stop reading the file immediately.

I know I could implement this with a hash, but I got curious if there was a way to do this directly with arrays?

Replies are listed 'Best First'.
Re: Deleting internal array elements
by chromatic (Archbishop) on Mar 04, 2004 at 06:18 UTC

    Perhaps you are looking for perldoc -f splice.

Re: Deleting internal array elements
by matija (Priest) on Mar 04, 2004 at 06:44 UTC
    As chromatic had said, splice is what you need if you're doing this with arrays.

    However, I should warn you that if that array is likely to be large, you are going to spend a lot of time looking through the array for the first element of each row in your file.

    You could also use a hash for both purposes: Put the elements you are looking for into the hash as keys, with some arbitrary value (say, 1), as values. Keep a count of the keys you put in with a simple scalar.

    Then instead of looking through an array for your row element, you simply test $hash{$row[0]}. If you want to avoid finding duplicates, you can delete $hash{$row[0]}, and it will no longer match. Decrement the count of keys each time you make a match, and when it reaches 0, you will know you no longer need to read the file.

Re: Deleting internal array elements
by NetWallah (Canon) on Mar 04, 2004 at 06:42 UTC
    Performance-wise, you may be better off keeping the element in the array, and simply assigning undef to it.

    You will need a scalar counter to keep track of the number of valid elements, and if that hits zero, you are done.

Re: Deleting internal array elements
by crabbdean (Pilgrim) on Mar 04, 2004 at 09:45 UTC
    With regard to the size of your file you are parsing memory management vs speed will be a consideration.

    You mentioned matching rows who's first element is found in an array. To step through your whole array for each row is a big expense. If your match array is 20 elements and you have 1,000,000 lines that means you have to loop 1,000,000 x 20 times. That's expensive!! For your match array you may want to make a hash instead and then check for the existence of the hash key

    %hash = {'match1'=> 1; 'match2' => 1; } foreach $row (@file) { if (exists hash{$$row[0]}) { ## do something } }
    Speed wise: splicing an array can be expensive, as NetWallah suggests, so try "undef". This though can be expensive memory wise for large arrays. You have to consider your uses and what is best.

    Also speed wise its faster to read a file into memory and then do what you want on it, rather than process as you read from the file. Although a 3Gb is a LOT of file and you will be hardup to read all that into memory.
    open (FILE, "<$file") or die $!; @file = <FILE>; foreach $row (@file) { ## do something with $row }
    Just some things to think about.

    Dean

    Programming these days takes more than a lone avenger with a compiler. - sam
Re: Deleting internal array elements
by Anonymous Monk on Mar 04, 2004 at 07:46 UTC
    Every time you think to yourself, is there a perl function you must to grep perlfunc
    perldoc perlfunc |grep -i array An named array in scalar context is quite different from what would at Functions for real @ARRAYs "last", "next", "redo", "return", "sub", "wantarray" "reset", "scalar", "undef", "wantarray" $wantarray, $evaltext, $is_require, $hints, $bitmask) = ca +ller($i); @array = split(/:/); returns "undef" when its argument is an empty array, *or* when Use of "defined" on aggregates (hashes and arrays) is if (@an_array) { print "has array elements\n" } Given an expression that specifies a hash element, array element, hash slice, or array slice, deletes the specified element(s) from the hash or array. In the case of an array, if the array elements happen to be at the end, the size of the array will shrink to the highest element that tests true for entry from the DBM file. Deleting from a "tie"d hash or array Deleting an array element effectively returns that position of the array to its initial, uninitialized state. Subsequently Note that deleting array elements in the middle of an array wi +ll and @ARRAY: foreach $index (0 .. $#ARRAY) { delete $ARRAY[$index]; delete @ARRAY[0 .. $#ARRAY]; or undefining %HASH or @ARRAY: @ARRAY = (); # completely empty @ARRAY undef @ARRAY; # forget @ARRAY ever existed final operation is a hash element, array element, hash slice, +or array slice lookup: When the hash is entirely read, a null array is returned in li +st "wantarray" for more on how the evaluation context can be array with more than one value, calls execvp(3) with the array with one element in it, the argument is checked for shel +l Given an expression that specifies a hash element or array array has ever been initialized, even if the corresponding val +ue print "Exists\n" if exists $array[$index]; print "Defined\n" if defined $array[$index]; print "True\n" if $array[$index]; A hash or array element can be true only if it's defined, and final operation is a hash or array key lookup or subroutine Although the deepest nested array or hash will not spring into See "Pseudo-hashes: Using an array as a hash" in perlref for an "@" character may be taken to mean the beginning of an arra +y (This is similar to pre-extending an array by assigning a larg +er number to $#array.) If you say on an entire array or hash to find out how many elements these have. For that, use "scalar @array" and "scalar keys %hash" with tied arrays and hashes. %hash = map { getkey($_) => $_ } @array; foreach $_ (@array) { array composed of those items of the original list for which t +he %hash = map { "\L$_", 1 } @array # perl guesses EXPR. +wrong %hash = map { +"\L$_", 1 } @array # perl guesses BLOCK. +right %hash = map { ("\L$_", 1) } @array # this also works %hash = map { lc($_), 1 } @array # as does this. %hash = map +( lc($_), 1 ), @array # this is EXPR and wor +ks! %hash = map ( lc($_), 1 ), @array # evaluates to (1, @ar +ray) @hashes = map +{ lc($_), 1 }, @array # EXPR, so needs , at +end pop ARRAY pop Pops and returns the last value of the array, shortening the array by one element. Has an effect similar to $ARRAY[$#ARRAY--] If there are no elements in the array, returns the undefined ARRAY is omitted, pops the @ARGV array in the main program, an +d the @_ array in subroutines, just like "shift". Note that if you're storing FILEHANDLES in an array or other push ARRAY,LIST Treats ARRAY as a stack, and pushes the values of LIST onto th +e end of ARRAY. The length of ARRAY increases by the length of $ARRAY[++$#ARRAY] = $value; array. ARRAY file in the directories specified in the @INC array. @INC array and will complain about not finding "Foo::Bar" ther +e. (hyphens allowed for ranges). All variables and arrays beginni +ng @ARGV and @INC arrays and your %ENV hash. Resets only package "wantarray"). If no EXPR is given, returns an empty list in li +st the returned semid_ds structure or semaphore value array. packed array of semop structures. Each semop structure can be shift ARRAY shift Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. If there are no elements in the array, returns the undefined value. If ARRAY is omitted, shifts the @_ array within the lexical scope of subroutines and formats, and the @ARGV array at file scopes the same thing to the left end of an array that "pop" and "pus +h" splice ARRAY,OFFSET,LENGTH,LIST splice ARRAY,OFFSET,LENGTH splice ARRAY,OFFSET splice ARRAY array, and replaces them with the elements of LIST, if any. In list context, returns the elements removed from the array. In no elements are removed. The array grows or shrinks as the end of the array. If LENGTH is omitted, removes everything elements off the end of the array. If both OFFSET and LENGTH a +re Example, assuming array lengths are passed before arrays: into the @_ array. Use of split in scalar context is deprecate +d, when you pass it an array as your first argument. The array is the array as the format, Perl will use the count of elements i +n the array as the format, which is almost never useful. argument in LIST, or if LIST is an array with more than one "TIESCALAR", "TIEHANDLE", "TIEARRAY", or "TIEHASH"). Typically A class implementing an ordinary array should have the followi +ng TIEARRAY classname, LIST perltie, Tie::Hash, Tie::Array, Tie::Scalar, and Tie::Handle. on a scalar value, an array (using "@"), a hash (using "%"), a return (wantarray ? (undef, $errmsg) : undef) if $they_ble +w_it; unshift ARRAY,LIST the array, and returns the new number of elements in the array +. wantarray return unless defined wantarray; # don't bother doing m +ore return wantarray ? @a : "@a";

      While the message you are delivering (i.e. "perldoc is your friend") may be good, I take issue with the way you delivered it. Surely your point could have been made as effectively without flooding the discussion with 100 lines output. I'd hesitate to directly quote perldocs at all, in fact. A pointer to the docs would be enough, and the sufficiently motivated person would look them up themselves. What you've done only serves to annoy people (like me).

Re: Deleting internal array elements
by Itatsumaki (Friar) on Mar 04, 2004 at 17:32 UTC

    Thanks everyone. I always thought I *knew* what splice did (replacing elements), and didn't read the perldocs.

    Tests on smaller datasets did indeed show that hashes were significantly faster than arrays here. Deleting the hash-element or just setting it to 0 or undef made very little performance difference on 100 MB files. Deleting the element will prevent any problems with duplicates down the road, so I went that way.

    Thanks again monks!

    -Tats
Re: Deleting internal array elements
by TomDLux (Vicar) on Mar 04, 2004 at 18:09 UTC

    This is a commonly asked question, though I'm unsure whether it is listed in the FAQ:

    How do I do expensive operation XXX on my 30 GB file?

    The correct answer is: one line at a time.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://333797]
Approved by Roger
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (9)
As of 2020-06-01 16:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (5 votes). Check out past polls.

    Notices?