Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: counting instances of one array in another array

by JavaFan (Canon)
on Feb 23, 2012 at 11:53 UTC ( [id://955713]=note: print w/replies, xml ) Need Help??


in reply to counting instances of one array in another array

Why not use a hash?
my @observed = ("ab", "ab", "ad", "an", "bd", "bn", "dn"); my %permCounts; $permCounts{$_}++ for @observed;
The answer why your solution is so slow: the number of permutations is exponential. You'll be spending the majority of your time initializing the @permCounts array, and most of its elements will remain 0.

Replies are listed 'Best First'.
Re^2: counting instances of one array in another array
by jsmagnuson (Acolyte) on Feb 23, 2012 at 12:24 UTC

    Thank you very much for your reply. I get the general idea, but my problem is that I need to end up with an array of size @perms that has counts of each item from @perms that occur in @observed. The reason is that what I really will do is generate that coding for each of many words, generate the array giving the counts of observed patterns from the entire set of possible patterns, and then calculate similarity of those resulting vectors as an index of word similarity.

    Your solution provides a hash with counts for each item in @observed, but then I need to associate those counts with the position of the corresponding element in @perms in @permCounts (or %permCounts!).

    So if:

    @observed = ("ab", "ab", "ad", "an", "bd", "bn", "dn");

    and:

    @perms = ("aa", "ab", "ac", "ad", "ae", "af", "ag", "an", "ba", "bd", "bh", "bn", "dn"); # @perms would never be something like this, # but just to give an example

    Then the result I want is that @permCounts would be:

    (0, 2, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1)

    Thanks very much,

    jim
      Well, good luck then. Be aware that there are 456976 possible combinations with words of length 4 (using the 26 letters of the English alphabet), and 308915776 words of length 6. The latter will take about 5.75Gb to store. If you want ngrams up to 8 characters, look around for a box with 3.7Tb of memory. And if you need to go to ngrams of length 12, you'd be looking at more than 1.5 Eb.
      jsmagnuson Any chance that Math::Fleximal will allow you to treat your range of possible combinations as a number sequence and do math on the key to find the position? I like the hash-key solution and treating the keys as fleximals may solve your position question without building a full array of possibilites.
        For example

        Update: key value look-up was wrong - fixed

      my %seen; $seen{$_}++ for @observed; @permCounts = map { $seen{$_} || 0 } @perms;
        And then you're back to the OPs problem again: the array @perm. Which will be huge.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://955713]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-16 04:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found