Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Pig Latin

by vroom (His Eminence)
on Feb 17, 2000 at 01:25 UTC ( [id://3586] : perlquestion . print w/replies, xml ) Need Help??

vroom has asked for the wisdom of the Perl Monks concerning the following question:

For a programming exercise in one of my C/C++ classes we had to write a program that would translate from English to Pig Latin:

The rules for this translation are as follows:

a) For any word starting with one or more consonants, move the starting consonants to the end of the word and append ay

b) for any word starting with a vowel simply append way

c) any other characters can be ignored and all letters can be assumed to be lowercase

I'm curious to see just how little code someone can do this in.


this is the time for all good me no
come to the aid of their country.
Should become:
isthay isway ethay imetay orfay allway oodgay enmay otay
omecay otay ethay aidway ofway eirthay ountrycay

Replies are listed 'Best First'.
(Ovid) RE: Pig Latin
by Ovid (Cardinal) on Jul 23, 2000 at 20:43 UTC
    Here's the shortest that I could come up with:
    s/\b((qu|[bcdfghjklmnpqrstvwxyz]+)?([a-z]+))/$2?$3.$2."ay":$1."way"/eg +;
    Points to note:
    • It handles multiple consonants at the start of the word (i.e. "this" comes "isthay")
    • It handles 'qu'.
    • It's terribly inefficient, but then, I guess that wasn't the point :)
    Here's the full breakdown, if anyone's interested:
    #!/usr/bin/perl -w my $test = "that is the time for all good men to come to the aid of th +eir country."; $test=~s/ \b # start of word ( # capture all to $1 ( # this is $2 qu # word starts with qu | # or [bcdfghjklmnpqrstvwxyz]+ # a consonent )? # but it's optional ( # this is $3 [a-z]+ # rest of word ) ) /$2 ? $3.$2."ay" : $1."way" /xeg; # if $2 evaluates as true +then # put it at end of word an +d add "ay" # otherwise, just add "way +" print $test;
      I'm just playing through. I made your code case-insensitive and did a bit of regex tomfoolery.
      # updated (5 years later!) s/\b(qu|[^\W0-9aeiou_]+)?([a-z]+)/$1?"$2$1ay":"$2way"/ieg;
      I don't see the need to save 3 pieces of data. And using [^\W0-9_] is shorter than [] and [b-df-hj-np-tv-z] and it forces the reader to think for a second. ;). And I saved space with the quoting on the RHS.

      Score: 53.

        japhy, originally, I was constructing a rather longer and optimized script to do the pig latin conversion. Then I went back and reread vroom's specs. First, I didn't use the /i modifier because he said we were to assume the data was lowercase and he wanted the shortest possible code.

        The reason I am using three backreferences is because the data saved to $2 is tricky. Your equivalent (ignoring the "qu" problem) is [^\W0-9_]. This allows you to match all alphabeticals but does no discrimination for vowels. However, you apeared to notice this when you mentioned [b-df-hj-np-tv-z]. Therefore, I suspect that you intended the following and (assuming you did intend this) I offer you kudos for a clever regex:

        I also noticed that, in this case, using the /i modifier ignored vroom's "lowercase" spec, but does result in a shorter regex.


      [bcdfghjklmnpqrstvwxyz]+ can become [bcdfghj-np-tv-z]+ to save six bytes without loss of accuracy.
Re: Pig Latin
by chromatic (Archbishop) on Feb 17, 2000 at 03:23 UTC
    s/\b([aeiou])(\w*)/$1$2way/g; s/\b([^aeiou .]+)([aeiou]\w*)/$2$1ay/g;
    This one handles spaces and periods, using greediness and word anchors to do the trick. It helps to handle rule b) first, when using two regexes.
RE: Pig Latin
by perlmonkey (Hermit) on Jul 24, 2000 at 03:59 UTC
    Score: 48
    $_ = <<EOF; this is the time for all good me no come to the aid of their country. test: quest EOF s/\b(qu|[^aeiou\s]*)(\w+)/$1?$2.$1.ay:$2.way/eg; print $_,"\n";
    isthay isway ethay imetay orfay allway oodgay emay onay omecay otay ethay aidway ofway eirthay ountrycay. esttay: estquay
    This is sort of a combination between what chromatic and japhy had. It keeps punctuation and keeps the 'qu' case.
Re: Pig Latin
by Anonymous Monk on Feb 17, 2000 at 02:15 UTC
    (posted after the clarification) Okay. If you insist... s@#([^aeiou]+?)(\w+)@#$2.$1.'ay'@#egi||s@#(\w+)@#$1.'way'@#egi - the poster of the two previous versions
Re: Pig Latin
by Anonymous Monk on Feb 17, 2000 at 02:16 UTC

    It depends on how general you want to be. My first run attempt is:

    while(<>) { s{ \b([bcdfghjklmpqrstvwxyz]?)(\w+) } { if($1) {$2.$1.'ay'} else {$2."way"} }egix; print; }

    The problem is that this doesn't correctly treat cases like th, sh, pr, etc. where there's really a compound starting consonant. Maybe, this would be a bit more correct:

    while(<>) { s{ \b([bcdfghjklmpqrstvwxyz]*)([aeiou]+)(\w*) } { if($1) {$2.$3.$1.'ay'} else {$2.$3."way"} }egix; print; }
      Adding qu| at the front of $1 and removing q from [...] provides support for the "qu" combination.   My first attempted tweak was to add (qu) inside $1's [...] but that hosed things for words like "jumped".

      Bonus points for guessing the cliche phrase I tested this with.  <grin>

      #!/usr/bin/perl -w use strict; while(<>) { s{ \b(qu|[bcdfghjklmpnrstvwxyz]*)([aeiou]+)(\w*) } { if($1) {$2.$3.$1.'ay'} else {$2.$3."way"} }egix; print; } # END
Re: Pig Latin
by davisagli (Scribe) on Jun 19, 2001 at 18:41 UTC

    I had some fun with this. My shortest fully-functional attempt is this 60-character regexp:


    with these features:

    • Handles multiple consonants at start of word (and handles 'qu' correctly)
    • Correctly handles y-related idiosyncrasies: yummy becomes ummyyay, but yttrium becomes yttriumway and rhythm becomes ythmrhay
    • Handles numbers correctly (42 doesn't become 42ay)
    • Counts apostrophe as part of word, so "don't" becomes "on'tday" (and handles other punctuation correctly)

    Note that if we ignore two words (yttrium and ytterbium), we can safely bring it down to 52 chars:


    And if we decide to switch to the dialect of pig latin that doesn't add 'w' on vowel-words, it's down to 44:


    If I combine that with perlmonkey's attempt (losing a bit of functionality in the process, although it still works pretty well) I can reach 36:


    Also note that my (un-golfed) attempt at a pig latin converter in Visual Basic took up almost 3500 characters (!) without handling nearly all of the exceptions mentioned here. This is (one of the many reasons) why I love Perl!

    Thanks to everyone for their previous attempts, which helped me quite a bit. Comments are welcome.


Re: Pig Latin
by stinkingpig (Sexton) on Jul 25, 2009 at 05:11 UTC

    It seems ungood that there's no CPAN module for Pig Latin, at least none that I could find... so I shamelessly munged Lingua::Bork and this thread.

    package Lingua::PigLatin; use strict; use warnings; require Exporter; use vars qw(@ISA %EXPORT_TAGS @EXPORT_OK $VERSION); @ISA = qw(Exporter); %EXPORT_TAGS = ( 'all' => [ 'piglatin' ] ); @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } ); $VERSION = '0.01'; sub new { my $class = shift; bless {}, ref($class)||$class; } sub piglatin { local $_ = shift; local $_ = shift if ref($_); s/\b(qu|[cgpstw]h # First syllable, including digraphs |[^\W0-9_aeiou]) # Unless it begins with a vowel or number ?([a-z]+)/ # Store the rest of the word in a variable $1?"$2$1ay" # move the first syllable and add -ay :"$2way" # unless it should get -way instead /iegx; return $_; } 1; __END__ =head1 NAME Lingua::PigLatin - Perl extension for Pig Latin =head1 SYNOPSIS use Lingua::PigLatin 'piglatin'; print piglatin("Put the candle back.") =head1 DESCRIPTION igpay atinlay. =head1 EXPORT None by default. Can export piglatin function for convenience. =head1 AUTHOR Jack Coates, E<lt>jack@monkeynoodle.orgE<gt> =head1 SEE ALSO =head1 KNOWN PROBLEMS Contractions. This man page is not in Pig Latin. =cut

    I haven't uploaded it to CPAN yet, I think I'll mess around with it a little first. Maybe some tests...

    "Nothing was broken, and it's been fixed." -- Jon Carroll
RE: Pig Latin
by Anonymous Monk on Feb 17, 2000 at 01:51 UTC
    I'm assuming that you _did_ mean "starting with", not "ending with".
    s{ (\w+) }{ if ($1 =~ /^[aeiou]/) { $1 . 'way'; } else { $1 . substr ($1, 0, 1) . 'ay'; } }egix;
Re: Pig Latin
by Anonymous Monk on Feb 17, 2000 at 01:58 UTC
    (Sorry for the previous post. I forgot to add the CODE tag.) A more concise (not to mention obfuscated) way to do it: s@#(\w)(\w+)@#$1=~m#[aeiou]#i?$1.$2.'way':$1.$2.$1.'ay'@#egi;
Re: Pig Latin
by Crulx (Monk) on Feb 18, 2000 at 13:56 UTC
    tr/a-zA-Z/ /cs; split; foreach $_ (@_){ /^[aeiou]\S*/ ? $_ .="way " : s/(\S)(\S*)/$2$1ay /; print; }
    This strips newlines and doesn't handle the "th" type cases but it is different than the ones posted above. Post your C++ answer as a point of comparison.
Re: Pig Latin
by Anonymous Monk on Feb 17, 2000 at 01:57 UTC
    A more concise (not to mention obfuscated) way to do it: s@#(\w)(\w+)@#$1=~m#aeiou#i?$1.$2.'way':$1.$2.$1.'ay'@#egi;