a more efficient lexicographical sort?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is anyone aware of a more efficient method to sort 500,000 plus OIDs other then using the Net::SNMP method: oid_lex_sort. I’ve tried it using it with both hash keys and arrays but it still takes around 3 – 5 mins and 400 megs of memory.

Thanks

Comment on a more efficient lexicographical sort?

Replies are listed 'Best First'.
Re: a more efficient lexicographical sort? (grt) by demerphq (Chancellor) on Jan 18, 2006 at 19:29 UTC
I have a feeling using perl's sort might not be the greatest plan in this situation. I'm sure there is a module that does an inplace sort that will be faster (I've not needed it so i haven't looked). Pushing 500k items on the stack to sort them and then popping them back off is IMO going to really hurt for long lists. An inline sort would avoid that overhead. The actually oid_lex_sort() code actually does this an extra time. In my implementation I avoid part of it by passing the array in as a ref. With regard to the actual implementation of ST that oid_grt_sort uses, I think the GRT might win. Its hard to say. Avoiding the array creation I should think will help with speed and I know will help for memory footprint. Overall I'd guess that this would be faster, but I haven't benchmarked it at all. Actually for that matter I've minimally tested this, but it appears to work and should give you an idea how it could be done. Also I've included the code for oid_lex_sort() which really you should have done yourself. :-) use strict; use warnings; sub oid_lex_sort(@) { return @_ unless (@_ > 1); map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { my $oid = $_; $oid =~ s/^\.//o; $oid =~ s/ /\.0/og; [$_, pack('N', split('\.', $oid))] } @_; } sub oid_grt_sort { my $ary=shift; map { my $len=unpack('N',substr($_,-4)); substr($_,-$len-4,$len) } sort map { my $oid=$_; $oid =~ s/^\.//; $oid =~ s/ /.0/g; pack('N', split(/\./, $oid)) . $_ . pack('N', length($_)); } @$ary; } my @o=('.1.3.4.5.7','1.2 3.4.8','3.2 9.1 7','1 3 4 5 9'); print join "\n",qw(grt:),oid_grt_sort(\@o),""; print join "\n",qw(---- lex:),oid_lex_sort(@o),""; [download] HTH --- $world=~s/war/peace/g	[reply] [d/l]
Re^2: a more efficient lexicographical sort? (grt) by Anonymous Monk on Jan 18, 2006 at 21:57 UTC
Thanks...it looks like the oid_lex_sort might be faster. Thanks again!	[reply]
Re: a more efficient lexicographical sort? by Limbic~Region (Chancellor) on Jan 18, 2006 at 19:08 UTC
Anonymous Monk, The oid_lex_sort() function uses the Schwartzian Transform. Additionally, it uses regular expressions to eliminate leading periods and converts spaces to '.0' globally. If you know none of your OIDs require those regexes then the following might be better. Benchmarking will say for sure. `my @OID; # defined elsewhere @OID = @OID[ map { unpack "N", substr($_,-4) } sort map { pack('N', split('\.', $OID[$_])) . pack "N", $_ } 0..$#list ];` [download] It is a straight forward translation to tye's one true sort. Update:* Other improvements likely exist but I don't know what constitutes a lexicographical sort of OIDs. Cheers - L~R	[reply] [d/l]
Re: a more efficient lexicographical sort? by salva (Canon) on Jan 19, 2006 at 09:34 UTC
try using Sort::Key, it is faster and uses less memory than both the ST and the GRT: `use Sort::Key qw(keysort); @sorted = keysort { pack "N*" => split /\./, $_ } @oids;` [download]	[reply] [d/l]
Re^2: a more efficient lexicographical sort? by Anonymous Monk on Jan 19, 2006 at 17:30 UTC
WOW !!! Much faster, around 50% percent faster and the memory footprint was 75% less. Thanks!	[reply]
Re^3: a more efficient lexicographical sort? by salva (Canon) on Jan 19, 2006 at 18:24 UTC
I think that you can make it even faster and less memory hungry encoding the oid numbers as unicode chars: `@sorted = keysort { pack "U*" => split /\./, $_ } @oids;` [download]	[reply] [d/l]
Re: a more efficient lexicographical sort? by japhy (Canon) on Jan 18, 2006 at 19:10 UTC
What is an "OBJECT IDENTIFIER"? The docs keep stating "dotted", so is it a N.N.N(.N.N.N...) structure? Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply]
Re^2: a more efficient lexicographical sort? by jhourcle (Prior) on Jan 19, 2006 at 12:55 UTC
Yes. And it's of indeterminate length, as it uses delegation. For instance: Read more... (1499 Bytes)	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom