Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Vow Triptych

by hashED (Novice)
on Dec 30, 2008 at 15:22 UTC ( #733275=snippet: print w/ replies, xml ) Need Help??

Description: So I'm getting married in October, and I started thinking about wedding vows, and so I wanted to get a better feel for what other people spend most of their wedding vow-ing time talking about. Here's a little script that came out of that effort. It takes a text file full of wedding vows (which you'll have to provide for yourself) and prints the text's triptycs.
#!/usr/bin/perl

my@wordsInOrder;
while (<>) {
    foreach ("$_" =~ m/\w+/g) {
        push @wordsInOrder, lc($_);
    }
}

my$trypHash = {};
for ($i=0;$i < scalar(@wordsInOrder)-2; $i++) {
    $trypHash->{$wordsInOrder[$i]." ".$wordsInOrder[$i+1]." ".$wordsIn
+Order[$i+2]} += 1;
}
my$dupeHash = {};
for ($i=0;$i < scalar(@wordsInOrder)-1; $i++) {
    $dupeHash->{$wordsInOrder[$i]." ".$wordsInOrder[$i+1]} += 1;
}
my$oneHash = {};
for ($i=0;$i < scalar(@wordsInOrder); $i++) {
    $oneHash->{$wordsInOrder[$i]} += 1;
}

foreach my$one (sort {$oneHash->{$b} <=> $oneHash->{$a}} keys %{$oneHa
+sh} ) {
    print "$one\n";
    foreach my$two (sort {$dupeHash->{$b} <=> $dupeHash->{$a}} keys %{
+$dupeHash} ) {
        next unless $two =~ m/^$one/;
        print "\t$two\n";
        foreach my$three (sort {$trypHash->{$b} <=> $trypHash->{$a}} k
+eys %{$trypHash} ) {
            next unless $three =~ m/^$two/;
            print "\t\t$three\n";
        }
    }
}
Comment on Vow Triptych
Download Code
Re: Vow Triptych
by jwkrahn (Monsignor) on Dec 30, 2008 at 17:14 UTC
    my@wordsInOrder; while (<>) { foreach ("$_" =~ m/\w+/g) { push @wordsInOrder, lc($_); } }

    Wow!   You are copying $_ to a string before binding it to a match and then iterating over a list in a loop when you could just use the list directly:

    my@wordsInOrder; while (<>) { push @wordsInOrder, lc() =~ m/\w+/g; }
      Huh, didn't know you could do that. Thanks for the edge-mication.
Re: Vow Triptych
by Arunbear (Parson) on Dec 30, 2008 at 18:56 UTC
    Additional simplifications are possible e.g. no need to loop over the word list three times, and why use hashrefs and make yourself do extra typing when you could use a regular hash.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @wordsInOrder; while (<>) { push @wordsInOrder, split /\W+/, lc($_); } my (%single, %double, %triple); my $index = 0; foreach my $word (@wordsInOrder) { $single{$word}++; my $next_word = $wordsInOrder[$index+1]; if($next_word) { $double{"$word $next_word"}++; } my $next_next_word = $wordsInOrder[$index+2]; if($next_next_word) { $triple{"$word $next_word $next_next_word"}++; } $index++; } foreach my $singlet (sort_by_frequency(\%single)) { print "$singlet\n"; foreach my $doublet (sort_by_frequency(\%double)) { next unless $doublet =~ /^$singlet\b/; print "\t$doublet\n"; foreach my $triplet (sort_by_frequency(\%triple)) { next unless $triplet =~ /^$doublet\b/; print "\t\t$triplet\n"; } } } sub sort_by_frequency { my $h = shift; return sort { $h->{$b} <=> $h->{$a} } keys %$h; }
    This also only matches whole words rather than fragments. Best wishes with the wedding anyway!
Re: Vow Triptych
by Arunbear (Parson) on Dec 31, 2008 at 16:29 UTC
    For any comparative linguists, here is what it looks like in Python (it even works):
    import re import sys from collections import defaultdict wordsInOrder = [] for line in sys.stdin: wordsInOrder.extend( re.findall(r'\w+', line.lower()) ) single = defaultdict(int) double = defaultdict(int) triple = defaultdict(int) for i, word in enumerate(wordsInOrder): single[word] += 1 try: next_word = wordsInOrder[i+1] double[word + ' ' + next_word] += 1 next_next_word = wordsInOrder[i+2] triple[word + ' ' + next_word + ' ' + next_next_word] += 1 except: pass def sort_by_frequency(d): return sorted(d.iterkeys(), cmp = lambda x,y: cmp(d[y], d[x])) for singlet in sort_by_frequency(single): print singlet for doublet in sort_by_frequency(double): if not doublet.startswith(singlet + ' '): continue print "\t", doublet for triplet in sort_by_frequency(triple): if not triplet.startswith(doublet + ' '): continue print "\t\t", triplet
    I needed amusement ;-)

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://733275]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2014-11-28 22:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (200 votes), past polls