Testing if an array contains a value and then deleting it in most efficient way

karpatov has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Testing if an array contains a value and then deleting it in most efficient way by Punitha (Priest) on Feb 18, 2008 at 14:38 UTC
Hi karpatov Please refer these nodes,you can get better idea finding and deleting an element from array deleting elements of one array from another Deleting Elements from an Array without their Index Number? Punitha	[reply]
Re: Testing if an array contains a value and then deleting it in most efficient way by Skeeve (Parson) on Feb 18, 2008 at 14:40 UTC
If you say "contains" the answer is most often "hash". But you should explain a bit more. Maybe some code you've already done. It's a bit difficult to give an answer when you don't supply enough information. `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re^2: Testing if an array contains a value and then deleting it in most efficient way by karpatov (Beadle) on Feb 18, 2008 at 15:54 UTC
OK. I have 1. a list of chemical compound IDs (10.000 IDs) 2. a xml output of a database with the records for compounds identified by the ID (almost million records) splitted into .xml files by 25.000. What I want to get certain value buried in the xml tree for an individual compound when the compound`s ID is in my list. During the weekend I was trying to solve a similar problem, when I was parsing other type xml of output, I wanted buried value from records that were containing some value conforming regex. When regex pattern was found the record was parsed by XML:twig (building only the twigs with value of interest and children_text). This strategy (regex and only then xml:pareser), adviced me by Monks pc88mxer and graff, worked great and in fact the IDs I have now are the result. The only difference now is that the records to be xml-parsed should be selected by extracting relevant value for <compoundID>value</compoundID> by regex and then checking if this value is among 10.000 IDs. So I am looking for most efficient way how to look-up the array of IDs (and deleting already found to speed-up the process). Is there some other strategy? Tx for reply.	[reply]
Re^3: Testing if an array contains a value and then deleting it in most efficient way by kyle (Abbot) on Feb 18, 2008 at 16:03 UTC
I think Skeeve is correct. Your best way forward is a hash of the ids. `my %set_contains = map { $_ => 1 } @id_set; if ( $set_contains{$some_id} ) { # $some_id is in the set } # removes $some_id from the set: delete $set_contains{$some_id};` [download]	[reply] [d/l]
Re: Testing if an array contains a value and then deleting it in most efficient way by ikegami (Patriarch) on Feb 18, 2008 at 15:59 UTC
Deleting from the middle of an array is inherently slow, since all the subsequent elements need to be shifted. Locating elements is slow too. Is the data sorted by the search criteria? If so, the item can be located using a binary search which is much faster than a linear search. Can you use a hash indexed by the search criteria. Locating and deleting the element would become really cheap. If you're stuck with an array, you might want to consider undefing the element instead of deleting it. Just skip the undefs when it's time to save the structure.	[reply]
Re^2: Testing if an array contains a value and then deleting it in most efficient way by parv (Parson) on Feb 18, 2008 at 16:21 UTC
In code (while using array) ... `use List::MoreUtils qw/ firstidx /; for my $val ( generate() ) { my $i; # Deal w/ "uninitialized variable" warning as appropriate. while ( -1 < ( $i = firstidx { $_ == $val } @array ) ) { delete $array[ $i ]; } } # Later... # exists() here would go better with earlier delete(), but # then would need to generate list of indices. my @save = grep { defined $_ } @array;` [download]	[reply] [d/l]
Re^3: Testing if an array contains a value and then deleting it in most efficient way by ikegami (Patriarch) on Feb 18, 2008 at 16:30 UTC
ow! You transformed a O(N) problem into an O(N²) solution. `firstidx` starts at the start of the array every loop pass. Not only did you half the speed, you doubled the memory requirements. "`firstidx`" copies the entire array before processing it.	[reply] [d/l] [select]
Re^4: Testing if an array contains a value and then deleting it in most efficient way by parv (Parson) on Feb 18, 2008 at 16:40 UTC
Re^5: Testing if an array contains a value and then deleting it in most efficient way by ikegami (Patriarch) on Feb 18, 2008 at 16:50 UTC
Some notes below your chosen depth have not been shown here
Re^2: Testing if an array contains a value and then deleting it in most efficient way by karpatov (Beadle) on Feb 18, 2008 at 16:30 UTC
In fact the values to be looked-up will come numerically sorted (probably if the record IDs in db are really ordered by their numeric ID), so sorting the values in the array would mean that only first element is deleted. So it seems that I can only ask whether the value to be looked up equals just the first item of the array! How only could miss it? Tx for pointing the possibility of ordering. Undefs are ignored during the search, so their presence doesnt slowdown the procedure?	[reply]
Re^3: Testing if an array contains a value and then deleting it in most efficient way by ikegami (Patriarch) on Feb 18, 2008 at 16:37 UTC
Fortunately for you, deleting the first (or last) element of an array (`shift(@a)`) is very fast (O(1)) in Perl.	[reply] [d/l]
Re^4: Testing if an array contains a value and then deleting it in most efficient way by karpatov (Beadle) on Feb 18, 2008 at 16:45 UTC
Re^3: Testing if an array contains a value and then deleting it in most efficient way by parv (Parson) on Feb 18, 2008 at 16:43 UTC
A side note ... The records may be sorted but will not be returned as such when not asked, if database server is Sybase 15.0.x.	[reply]
Re^4: Testing if an array contains a value and then deleting it in most efficient way by mpeppler (Vicar) on Feb 19, 2008 at 11:03 UTC
OT: Sybase by parv (Parson) on Feb 19, 2008 at 11:24 UTC
Re: Testing if an array contains a value and then deleting it in most efficient way by dsheroh (Monsignor) on Feb 18, 2008 at 17:44 UTC
I'll put in another vote for "use a hash instead of an array" unless there's some reason that the order of the values needs to be preserved. If the order that you'll need the values is predictable, you may also be able to have the database pre-sort them for you so that you can just shift values out of the array. One sort will almost always be faster than 10,000 searches. (Especially since those searches will have to be done in extremely-ineffecient ways unless the data is sorted first...)	[reply]
Re: Testing if an array contains a value and then deleting it in most efficient way by jwkrahn (Abbot) on Feb 19, 2008 at 04:52 UTC
Use a hash. `@array = grep $_ != $value, @array;` Or a fairly efficient method: `for my $i ( 0 .. $#array ) {` `if ( $array[ $i ] == $value ) {` `@array[ $i, -1 ] = @array[ -1, $i ];` `$#array--;` `last;` `}` `}` Update: after my naive attempt at efficiency and some benchmarking this works better: `my $i` `for ( @array ) {` `if ( $_ == $value ) {` `$i = $_;` `last;` `}` `}` `splice @array, $i, 1;`	[reply] [d/l] [select]


Perl-Sensitive Sunglasses
	PerlMonks