Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Two dimensional sets intersection

by menth0l (Monk)
on Apr 10, 2013 at 13:12 UTC ( #1027966=perlquestion: print w/ replies, xml ) Need Help??
menth0l has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I have two discontinuous sets of values and some pairs from each sets are "valid" and have assigned values, for example:
# subset 1 of (A..Z) set_1 = (A, B, C, X, Y, Z) # subset 2 of (a..z) set_2 = (q, w, r, t, u) # valid pairs valid = ((A,q) = 1, (A,u) = 6, (A,k) => 3, (C,t) = 10, (Z,u) = 30)
I have a problem with "translating" this to perl. I know I need some kind of map for storing information about valid pairs and their id (like python's map of tuples: (x,y) => z). But having two sets, each consisting of 100 elements requires 100x100=10000 combinations... Is there any quicker method for this?

If I'd have an index like this:
my %INDEX = { A => { q => 1, u => 6, k => 3 }, C => { t => 10 }, Z => { u => 30 }, M => { q => 100 } }
I could narrow search by creating intersection (e.g. using Set::Object):
# narrow by first set (set_1) x (keys %INDEX) = (set_1) x (A, C, M, Z) = (A, C, Z)
then i would be left with an array of hash-refs and do:
set_2 x (q, u, k) + set_2 x (t) + set_2 x (u) = (1, 6, 10, 30)

I don't like this idea though and I don't know if it would be faster than combinations hash checks. Is there a perl module that would help me with this?

Comment on Two dimensional sets intersection
Select or Download Code
Re: Two dimensional sets intersection
by daxim (Chaplain) on Apr 10, 2013 at 13:28 UTC
    This is a FAQ.
    use Hash::MultiKey; tie my %valid, 'Hash::MultiKey'; %valid = ( [qw(A q)] => 1, [qw(A u)] => 6, [qw(A k)] => 3, [qw(C t)] => 10, [qw(Z u)] => 30, );
Re: Two dimensional sets intersection
by LanX (Canon) on Apr 10, 2013 at 14:29 UTC
    first of all I have to say that I have the impression that you are trying to reinvent a wheel from graph theory.

    Your structure looks for me like a weighted bipartite graph, and you are trying to find a subgraph.

    Did you check CPAN for related modules?

    > (like python's map of tuples: (x,y) => z)

    don't know enough about Python, but this can easily be simulated in Perl with old-style multidim hashes and grepping with a split on the keys.

    DB<116> $index{"A","a"}=1 => 1 DB<117> $index{"A","b"}=2 => 2 DB<118> $index{"B","a"}=3 => 3 DB<119> $index{"B","b"}=4 => 4 DB<120> grep { my ($k1,$k2) = split /$;/,$_; $k1 eq "A" } keys %inde +x => ("A\34a", "A\34b") DB<121> %set1=(A=>1) => ("A", 1) DB<122> %set2=(a=>1) => ("a", 2) DB<124> grep { my ($k1,$k2) = split /$;/,$_; $set1{$k1} and $set2{$k +2} } keys %index => "A\34a"

    This brute force search might be elegantš, but for sure not as efficient as looping thru your hierarchical structure.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    1) especially when delegating the split to a function called by grep

    grep &splitted(\%set1,\%set2) , keys %index

    update

    added missing keys

Re: Two dimensional sets intersection
by Not_a_Number (Parson) on Apr 10, 2013 at 20:44 UTC

    No need to worry about narrowing your search or whatever. Hash lookups are fast.

    The code below creates a hash for a 100 x 100 matrix. It also assumes that each matrix pair has a value (worst case, given your 'spec'). To simplify validation, I make this value 1, but changing it to any other number makes no difference to the time required for lookup.

    Subsequently, I make another worst-case assumption, namely that your lists of search terms (what you call Subset 1 and Subset 2) are maximally large.

    On my machine, it finishes before I can blink. Run it and see:

    use strict; use warnings; # Create 100 x 100 datastructure: my @uc = ( 'AA' .. 'DV' ); my @lc = ( 'aa' .. 'dv' ); my %index; for my $k ( @uc ) { for my $v ( @lc ) { $index{$k}{$v} = 1; } } # Parse it: my $score; for my $k ( @uc ) { # or: for my $k ( @subset_1 ) for my $v ( @lc ) { # or: for my $v ( @subset_2 ) $score += $index{$k}{$v} || 0; } } print $score;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1027966]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2014-07-14 07:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (256 votes), past polls