Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Vow Triptych

by hashED (Novice)
on Dec 30, 2008 at 15:22 UTC ( #733275=snippet: print w/ replies, xml ) Need Help??

Description: So I'm getting married in October, and I started thinking about wedding vows, and so I wanted to get a better feel for what other people spend most of their wedding vow-ing time talking about. Here's a little script that came out of that effort. It takes a text file full of wedding vows (which you'll have to provide for yourself) and prints the text's triptycs.
#!/usr/bin/perl

my@wordsInOrder;
while (<>) {
    foreach ("$_" =~ m/\w+/g) {
        push @wordsInOrder, lc($_);
    }
}

my$trypHash = {};
for ($i=0;$i < scalar(@wordsInOrder)-2; $i++) {
    $trypHash->{$wordsInOrder[$i]." ".$wordsInOrder[$i+1]." ".$wordsIn
+Order[$i+2]} += 1;
}
my$dupeHash = {};
for ($i=0;$i < scalar(@wordsInOrder)-1; $i++) {
    $dupeHash->{$wordsInOrder[$i]." ".$wordsInOrder[$i+1]} += 1;
}
my$oneHash = {};
for ($i=0;$i < scalar(@wordsInOrder); $i++) {
    $oneHash->{$wordsInOrder[$i]} += 1;
}

foreach my$one (sort {$oneHash->{$b} <=> $oneHash->{$a}} keys %{$oneHa
+sh} ) {
    print "$one\n";
    foreach my$two (sort {$dupeHash->{$b} <=> $dupeHash->{$a}} keys %{
+$dupeHash} ) {
        next unless $two =~ m/^$one/;
        print "\t$two\n";
        foreach my$three (sort {$trypHash->{$b} <=> $trypHash->{$a}} k
+eys %{$trypHash} ) {
            next unless $three =~ m/^$two/;
            print "\t\t$three\n";
        }
    }
}
Comment on Vow Triptych
Download Code
Re: Vow Triptych
by jwkrahn (Monsignor) on Dec 30, 2008 at 17:14 UTC
    my@wordsInOrder; while (<>) { foreach ("$_" =~ m/\w+/g) { push @wordsInOrder, lc($_); } }

    Wow!   You are copying $_ to a string before binding it to a match and then iterating over a list in a loop when you could just use the list directly:

    my@wordsInOrder; while (<>) { push @wordsInOrder, lc() =~ m/\w+/g; }
      Huh, didn't know you could do that. Thanks for the edge-mication.
Re: Vow Triptych
by Arunbear (Parson) on Dec 30, 2008 at 18:56 UTC
    Additional simplifications are possible e.g. no need to loop over the word list three times, and why use hashrefs and make yourself do extra typing when you could use a regular hash.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @wordsInOrder; while (<>) { push @wordsInOrder, split /\W+/, lc($_); } my (%single, %double, %triple); my $index = 0; foreach my $word (@wordsInOrder) { $single{$word}++; my $next_word = $wordsInOrder[$index+1]; if($next_word) { $double{"$word $next_word"}++; } my $next_next_word = $wordsInOrder[$index+2]; if($next_next_word) { $triple{"$word $next_word $next_next_word"}++; } $index++; } foreach my $singlet (sort_by_frequency(\%single)) { print "$singlet\n"; foreach my $doublet (sort_by_frequency(\%double)) { next unless $doublet =~ /^$singlet\b/; print "\t$doublet\n"; foreach my $triplet (sort_by_frequency(\%triple)) { next unless $triplet =~ /^$doublet\b/; print "\t\t$triplet\n"; } } } sub sort_by_frequency { my $h = shift; return sort { $h->{$b} <=> $h->{$a} } keys %$h; }
    This also only matches whole words rather than fragments. Best wishes with the wedding anyway!
Re: Vow Triptych
by Arunbear (Parson) on Dec 31, 2008 at 16:29 UTC
    For any comparative linguists, here is what it looks like in Python (it even works):
    import re import sys from collections import defaultdict wordsInOrder = [] for line in sys.stdin: wordsInOrder.extend( re.findall(r'\w+', line.lower()) ) single = defaultdict(int) double = defaultdict(int) triple = defaultdict(int) for i, word in enumerate(wordsInOrder): single[word] += 1 try: next_word = wordsInOrder[i+1] double[word + ' ' + next_word] += 1 next_next_word = wordsInOrder[i+2] triple[word + ' ' + next_word + ' ' + next_next_word] += 1 except: pass def sort_by_frequency(d): return sorted(d.iterkeys(), cmp = lambda x,y: cmp(d[y], d[x])) for singlet in sort_by_frequency(single): print singlet for doublet in sort_by_frequency(double): if not doublet.startswith(singlet + ' '): continue print "\t", doublet for triplet in sort_by_frequency(triple): if not triplet.startswith(doublet + ' '): continue print "\t\t", triplet
    I needed amusement ;-)

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://733275]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (15)
As of 2014-08-29 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (280 votes), past polls