Re: Bidirectional lookup algorithm? (Updated: further info.)

BrowserUk,

I came into this late, but I think using Perl's great 'index' function may give you the desired results. I copied your scripts to get a start, and then used a simple scalar to build the 'Bidirectional lookup' functions. I'm sure they can be improved!

Here goes:

#!/usr/local/bin/myperl

    use strict;
    use warnings;

    use Time::HiRes qw( gettimeofday );
    use Devel::Size qw(total_size);
    my ( $i, %intByStr, %strByInt, $Both );

    $i = 1;
    $intByStr{ $_ } = $i++ for 'aaaa' .. 'zzzz';;

    $strByInt { $intByStr{ $_ } } = $_ for keys %intByStr;;

    print "Hash1: ",total_size( \%intByStr ), "\nHash2: ", total_size(
+ \%strByInt ), "\n";;

    our $ksep = chr(253); our $isep = chr(254);
    foreach my $key ( keys %intByStr )
    {    $Both .= $ksep . $key . $isep . $intByStr{ $key } . "\n";
    }

    print "\$Both: ",length( $Both ), "\n\n";;

    my $testkey = "zabc"; my $testaddr = $intByStr{ $testkey };

    my $stime = gettimeofday;
    my $tkey = get_key( $testaddr );
    print "\tget_key:  ", gettimeofday - $stime, " seconds\n";
    $stime = gettimeofday;
    my $taddr = get_addr( $tkey );
    print "\tget_addr: ", gettimeofday - $stime, " seconds\n";
    print "\t|$tkey|$taddr|\n";
    exit;

sub get_key
{   our( $ksep, $isep );
    my $key = shift; my $result;
    my $si = index( $Both, "$isep$key\n" );
    if ( $si >= 0 )
    {    my $sj = rindex ( $Both, $ksep, $si );
        if ( $sj >= 0 ) { $result = substr( $Both, $sj+1, $si - ($sj+1
+) ); }
    }
    return ( $result );
}

sub get_addr
{   our( $ksep, $isep );
    my $key = shift; my $result;
    my $srch = "$ksep$key$isep"; my $len = length( $srch );
    my $si = index( $Both, "$ksep$key$isep" );
    if ( $si >= 0 )
    {    my $sj = index ( $Both, "\n", $si + $len );
        if ( $sj >= 0 ) { $result = substr( $Both, $si+$len, $sj - ($s
+i+$len) ); }
    }
    return ( $result );
}

1;

__END__

~/tmp:> perl  BUKtest.plx
=========================================================
Hash1: 37741336
Hash2: 43113919
$Both: 5829583
        get_key:  0.00113105773925781 seconds
        get_addr: 0.000988960266113281 seconds
        |zabc|439429|
=========================================================
Actual Time using Linux Debian 'time' command w/perl.5.14.2:
   real    0m1.591s
   user    0m1.560s
   sys     0m0.032s
[download]

I didn't build my $Both scalar from the raw input, so that I could verify I did it correctly, but $Both doesn't need to be in any order for this to work.

Please note the subs return 'undef' if the key or address is not in $Both. Also I would use 'pack' on the address, but that may or may not save space depending on the actual data supplied.

Regards...Ed

"Well done is better than well said." - Benjamin Franklin

Comment on Re: Bidirectional lookup algorithm? (Updated: further info.) Download Code

Replies are listed 'Best First'.
Re^2: Bidirectional lookup algorithm? (Updated: further info.) by BrowserUk (Patriarch) on Jan 06, 2015 at 02:17 UTC
If you look at my code in Re^2: Bidirectional lookup algorithm? (Updated: further info.), using a hash does lookups by name in: 0.976562/3655808 = 0.000000267 secs; and by addr in 1.065514/3655808 = 0.000000291. `C:\test>1112165-hash -S=4 Start: mem: 7,972 K 3655808 Hash built: mem: 651,160 K Fetch by Name took 0.976562 seconds Fetch by Addr took 1.065514 seconds Hash queried: check mem: 651,184 K` [download] And using an in-memory sqlite DB does them ~30 times slower at: by name in: 30.339751/3655808 = 0.0000083; and by addr in: 30.587817/3655808 = 0.0000084 for a memory reduction of 5/6ths. `C:\test>1112165-sql -S=4 Start: mem: 9,668 K DB created: mem: 11,084 K DB populated: mem: 88,800 K DB indexed: mem: 119,372 K Fetch by Name took 30.339751 seconds Fetch by Addr took 30.587817 seconds DB queried: mem: 119,400 K` [download] And using a disk-based sqllite db does them ~57 times slower at: by name in: 6.867170/456976 = 0.0000150; and by addr in: 6.949115/456976 = 0.0000152 seconds for a memory reduction of ~3/5ths: `C:\test>1112165-sql Start: mem: 9,620 K DB created: mem: 11,464 K DB populated: mem: 29,788 K DB indexed: mem: 45,540 K Fetch by Name took 6.867170 seconds Fetch by Addr took 6.949115 seconds DB queried: mem: 45,564 K` [download] The (single, and possible non representative depending upon where in the strings the chosen keys are (beginning middle or end)) timings you've posted are 800x slower again than the memoryDB, or nearly 5000 times slower than the hashes. The net result would be that my current 8 hour runs would require 4.5 years. And that's before I added the extra data that the space reduction would permit. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^3: Bidirectional lookup algorithm? (Updated: further info.) by flexvault (Monsignor) on Jan 06, 2015 at 11:10 UTC
BrowserUk, When programming my database engine, I found that 'index' did best with smaller scalars ( reason for mentioning 'database sharding' ). My times were close to the worst case ( worst case is 'not found' ). I found that with a 8-byte key, the 'index' technique I used was 4.7 times faster searching 512 byte records than searching 1,024 byte records. Naturally, 'index' searching a 5MM+ byte record is going to be real slow! My attempt was to give you another approach, not a final solution. Good Luck...Ed "Well done is better than well said." - Benjamin Franklin	[reply]


The stupid question is the question not asked
	PerlMonks