Is this sort of what you had in mind?
The Code
#!/usr/bin/env perl
use 5.014;
use warnings;
my %kw; # $kw{kw}->{name} = Number of times kw appears in name.txt
my %name; # Inverse of %kw; $name{name}->{keyword}
read_words() for <*.txt>;
# Now we can do all sorts of useful things with the two hashes:
say "$_ has " . (keys $name{$_}) . " unique words" for sort keys %name
+;
say '';
# Keywords ordered by occurrence count
for my $kw (sort { keys $kw{$b} <=> keys $kw{$a} } keys %kw) {
my $count = keys $kw{$kw};
printf "%10s appears in %2d file%s: %s\n",
$kw, $count, $count > 1 ? 's' : ' ',
join(', ', sort keys %{$kw{$kw}});
}
# Pull in the word lists.
sub read_words {
open my $fh, '<', $_ or die "Can't open $_: $!";
my $name = s/\.txt$//r;
while (<$fh>) {
chomp;
$kw{$_}->{$name}++;
$name{$name}->{$_}++;
}
close $fh;
}
Input
Reads all *.txt files in the current directory. Each text file is expected to contain exactly one keyword per line. For example:
al.txt:
abel
abel
baker
camera
delta
edward
fargo
golfer
jerky
Output
al has 8 unique words
bob has 7 unique words
carmen has 6 unique words
don has 3 unique words
ed has 3 unique words
fargo appears in 5 files: al, bob, carmen, don, ed
jerky appears in 4 files: al, carmen, don, ed
icon appears in 3 files: carmen, don, ed
golfer appears in 3 files: al, bob, carmen
edward appears in 2 files: al, bob
camera appears in 2 files: al, bob
delta appears in 2 files: al, bob
hilton appears in 2 files: bob, carmen
baker appears in 2 files: al, bob
kappa appears in 1 file : carmen
abel appears in 1 file : al
Efficiency
Memory: O(2nc) where n is the number of unique keywords, and c is the keyword length.
Execution: Most operations become near-O(1) (constant time), including counting number of keywords. Obviously looping to display each keyword as I have done will incur n total lookups; this is the best possible order.