Re: Testing if an array contains a value and then deleting it in most efficient way
by Punitha (Priest) on Feb 18, 2008 at 14:38 UTC
|
| [reply] |
Re: Testing if an array contains a value and then deleting it in most efficient way
by Skeeve (Parson) on Feb 18, 2008 at 14:40 UTC
|
| [reply] [d/l] [select] |
|
OK.
I have
1. a list of chemical compound IDs (10.000 IDs)
2. a xml output of a database with the records for compounds identified by the ID (almost million records) splitted into .xml files by 25.000.
What I want to get certain value buried in the xml tree for an individual compound when the compound`s ID is in my list.
During the weekend I was trying to solve a similar problem, when I was parsing other type xml of output, I wanted buried value from records that were containing some value conforming regex. When regex pattern was found the record was parsed by XML:twig (building only the twigs with value of interest and children_text). This strategy (regex and only then xml:pareser), adviced me by Monks pc88mxer and graff, worked great and in fact the IDs I have now are the result. The only difference now is that the records to be xml-parsed should be selected by extracting relevant value for <compoundID>value</compoundID> by regex and then checking if this value is among 10.000 IDs. So I am looking for most efficient way how to look-up the array of IDs (and deleting already found to speed-up the process). Is there some other strategy?
Tx for reply.
| [reply] |
|
my %set_contains = map { $_ => 1 } @id_set;
if ( $set_contains{$some_id} ) {
# $some_id is in the set
}
# removes $some_id from the set:
delete $set_contains{$some_id};
| [reply] [d/l] |
Re: Testing if an array contains a value and then deleting it in most efficient way
by ikegami (Patriarch) on Feb 18, 2008 at 15:59 UTC
|
Deleting from the middle of an array is inherently slow, since all the subsequent elements need to be shifted. Locating elements is slow too.
Is the data sorted by the search criteria?
If so, the item can be located using a binary search which is much faster than a linear search.
Can you use a hash indexed by the search criteria.
Locating and deleting the element would become really cheap.
If you're stuck with an array, you might want to consider undefing the element instead of deleting it. Just skip the undefs when it's time to save the structure.
| [reply] |
|
use List::MoreUtils qw/ firstidx /;
for my $val ( generate() )
{
my $i;
# Deal w/ "uninitialized variable" warning as appropriate.
while ( -1 < ( $i = firstidx { $_ == $val } @array ) )
{
delete $array[ $i ];
}
}
# Later...
# exists() here would go better with earlier delete(), but
# then would need to generate list of indices.
my @save = grep { defined $_ } @array;
| [reply] [d/l] |
|
ow! You transformed a O(N) problem into an O(N2) solution.
firstidx starts at the start of the array every loop pass.
Not only did you half the speed, you doubled the memory requirements.
"firstidx" copies the entire array before processing it.
| [reply] [d/l] [select] |
|
|
|
|
In fact the values to be looked-up will come numerically sorted (probably if the record IDs in db are really ordered by their numeric ID), so sorting the values in the array would mean that only first element is deleted.
So it seems that I can only ask whether the value to be looked up equals just the first item of the array! How only could miss it? Tx for pointing the possibility of ordering.
Undefs are ignored during the search, so their presence doesnt slowdown the procedure?
| [reply] |
|
Fortunately for you, deleting the first (or last) element of an array (shift(@a)) is very fast (O(1)) in Perl.
| [reply] [d/l] |
|
|
| [reply] |
|
|
Re: Testing if an array contains a value and then deleting it in most efficient way
by dsheroh (Monsignor) on Feb 18, 2008 at 17:44 UTC
|
I'll put in another vote for "use a hash instead of an array" unless there's some reason that the order of the values needs to be preserved.
If the order that you'll need the values is predictable, you may also be able to have the database pre-sort them for you so that you can just shift values out of the array. One sort will almost always be faster than 10,000 searches. (Especially since those searches will have to be done in extremely-ineffecient ways unless the data is sorted first...) | [reply] |
Re: Testing if an array contains a value and then deleting it in most efficient way
by jwkrahn (Abbot) on Feb 19, 2008 at 04:52 UTC
|
- Use a hash.
- @array = grep $_ != $value, @array;
- Or a fairly efficient method:
for my $i ( 0 .. $#array ) {
if ( $array[ $i ] == $value ) {
@array[ $i, -1 ] = @array[ -1, $i ];
$#array--;
last;
}
}
- Update: after my naive attempt at efficiency and some benchmarking this works better:
my $i
for ( @array ) {
if ( $_ == $value ) {
$i = $_;
last;
}
}
splice @array, $i, 1;
| [reply] [d/l] [select] |