laziness, impatience, and hubris PerlMonks

### Finding Combinations of Pairs

by zod (Scribe)
 on Jan 14, 2009 at 03:21 UTC Need Help??
zod has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have a file that has a list of words on each line, let's say:

```dog monkey cat
cat ball stone
monkey iron cat zoo
```

What I need to do is find out which two-word combinations (order does not matter) on each line are most common in the file.

So I need to generate all pair combinations for each line and then figure out which pair combo is the most common over all the lines.

So, the first line should generate:

```dog monkey
dog cat
monkey cat
```

I figured I should dump each line into an array and then generate the pair combos for each array. So, I started with tye's Finding all Combinations node:

But, alas, I'm stuck already as I'm not sure how to generate only _pair_ combinations rather than every possible combo.

Can anyone point me in the right direction? My spidey sense tells me I should use a hash. But not sure where to start...

Thanks much,

Zod

Replies are listed 'Best First'.
Re: Finding Combinations of Pairs
by kyle (Abbot) on Jan 14, 2009 at 03:46 UTC

I think your spidey sense serves you well.

This smells a little like homework, so I'm putting my solution in spoiler tags.

You should come up with your pairs, and for each pair keep a count (in a hash) of how many times that pair has been seen. Each key of the hash will be a pair, and each value will be the count of occurrences. Once you have that, you can sort the keys by the values and come up with the pair that appears the most. I suspect every one of these tasks has been the subject of some post you could find with Super Search.

```#! /usr/bin/perl -w
use strict;

my %pair_count;
for (<>) {
\$pair_count{\$_}++ for get_pairs(grep length \$_, split /\s/, \$_);
}

my \$comp = sub (\$\$) {-(\$pair_count{\$_[0]} <=> \$pair_count{\$_[1]})};
print((sort \$comp keys %pair_count)[0], \$/);

sub get_range_iterator {
my (\$start, \$end) = @_;
return sub {
if (\$start <= \$end) {
return \$start++;
}
else {
return;
}
}
}

sub get_pairs {
my \$outer_iter = get_range_iterator(0, \$#_);
my \$i = \$outer_iter->();
while (defined(\$i)) {
my \$inner_iter = get_range_iterator(\$i+1, \$#_);
my \$j = \$inner_iter->();
while (defined \$j) {
push @answer, join " ", sort @_[\$i, \$j];
\$j = \$inner_iter->();
}
\$i = \$outer_iter->();
}
}

Or else you offer an answer that appears to be well laid out and straight-forward, but scales abysmally...

• another intruder with the mooring in the heart of the Perl

tilly,

It doesn't look too much like homework to me but another idea would be to use code obviously beyond that of the course?

Cheers - L~R

I like this solution, because it's so simple and efficient:

```#!/usr/bin/perl -w
use strict;

my @lines = <DATA>;

my (%words, %count);
@words{map {split ' '} @lines} =1;
my @uniq = keys %words;                 # create list of unique words

foreach my \$i (0 .. \$#uniq - 1) {
foreach my \$j (\$i+1 .. \$#uniq) {

# count the pairs
do { \$count{"\$uniq[\$i] \$uniq[\$j]"}++ if /\$uniq[\$i]/ an
+d /\$uniq[\$j]/ } foreach @lines;

}
}

print   map "\$_ : \$count{\$_}\n",                # print the pairs in s
+orted order
sort {\$count{\$a} <=> \$count{\$b}}
keys %count;

__DATA__
dog monkey cat
cat ball stone
monkey iron cat zoo

(NB see parent node)

Thanks for the reply. I didn't consider combining the pairs into a single value. Next time I will. Thanks.
BTW - and being serious now - you don't actually need to combine them into a single value.

If you do something like:

```\$h->{\$word1}->{\$word2}++
```\$h->{"\$word1 \$word2"}++
in the centre of the loops, then you end up with a useful tree-like structure, which gives you a quick index of, for each word, how many times each other word appears in combination with it.

Try it and use Data::Dumper to print the output, and you'll see what I mean.

Although I appreciate that may not be what the homework called for. <grin>

Best wishes, andy.

Re: Finding Combinations of Pairs
by BrowserUk (Pope) on Jan 14, 2009 at 08:52 UTC

```#! perl -slw
use strict;

sub pairs {
map {
my \$i = \$_;
map [ @_[ \$i, \$_ ] ], \$i+1 .. \$#_;
} 0 .. \$#_;
}

my %pairCounts;

++\$pairCounts{ "@\$_" } for map pairs( sort split ), <DATA>;

print "@\$_" while @{ \$_ = [ each %pairCounts ] };

__DATA__
dog monkey cat
cat ball stone
monkey iron cat zoo

Gives:

```C:\test>junk4
cat stone 1
cat monkey 2
iron monkey 1
cat zoo 1
dog monkey 1
monkey zoo 1
cat dog 1
ball cat 1
cat iron 1
iron zoo 1
ball stone 1

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Finding Combinations of Pairs
by repellent (Priest) on Jan 14, 2009 at 18:23 UTC

Create A New User
Node Status?
node history
Node Type: perlquestion [id://736135]
Approved by kyle
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2017-08-23 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (350 votes). Check out past polls.

Notices?