Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Perl Golf 101

by chargrill (Parson)
on May 26, 2006 at 14:43 UTC ( #551876=perlmeditation: print w/ replies, xml ) Need Help??

I was recently trying to golf down some code to post as an obfuscation, and in a flash of inspiration happened upon some golfing techniques that I've never been able to find listed in a tidy package, so I thought I'd post my tips and tricks (relatively basic though they are) for anyone on a similar quest to shave characters.

This isn't meant as an exhaustive treatise on golfing, there are others much better at it than I. However, I was unable to find a good "howto" anywhere, and though I'd make a little one here. It's my hope that others better at this sport might share some tidbits of their own.

Note: It's my hope to polish this up and have it moved to the tutorial section. I'll leave this as a readmore until then.

Important: NO tip or technique presented here has ANY reason to show up in production code - golf is fine for fun and games, but can and will cause production code to fail to run properly, fail to be easy to maintain, etc.

  1. Reduce all variable and subroutine names down to single characters. Eliminate as much whitespace as possible. These might be obvious, but it's certainly a starting point. For purposes of readability of this tutorial, whitespace is NOT fully eliminated. And thanks to a reminder from liverpole, the return value of a subroutine is the last statement of the subroutine, so all those pesky returns should be able to be dropped as well.
  2. Do you really need strict compliance? Get rid of variable declarations where possible. Dropping this:
    my(%s,$y,$l,$t,$d);
    turns something like this:
    my $k = k(\%s);
    into this:
    $k = k();
    Limbic~Region reminds me that:
    $k=&k;
    ... is still shorter, and if sub k is defined before it's called, it can be shorter still:
    $k=k;
    and this:
    my $l = $d / int(length($t) / $k) / 100;
    into this:
    $l = $d / int(length($t) / $k) / 100;
    liverpole suggests that clever rearrangement of operands yields a few more savings:
    $l = $d / 100 / int length($t)/ $k;
    Also, any numerical value assigned to $* is implicitly an int, and whio suggests that y///c is a character short than length, so now we have:
    $* = $d / $t=~y///c / $k;
    Note: Yes, I did drop dividing by 100. My original code wouldn't work without it, my new golfed code won't work with it. I don't know the exact change that removed its necessity.
    More hints: $_ doesn't need to be declared as a my variable. $* removes the necessity for converting a value to an int, and whio suggests further that $- and $= have similar behavior, except $- can't be negative.
  3. Do you actually need lexically scoped variables? If you do, take as much advantage of any "my" declarations as you can:
    sub i{ my ($g,$l,$t) = @_; my @c; ... }
    1. $l and $t are now globals, so those get dropped, but I still need a lexically scoped $g. Also, the code calling this sub is shortened to also take advantage of the globals.
    2. I still need a lexically scoped array @c, and I know that sub i will never be called with more than one parameter, so I save some characters:
    sub i{ my ($g,@c) = @_; ... }
  4. my($o)=@_; is 1 less character than my$o=shift;. Furthermore, if you know that subroutine will always get called with a valid parameter, you can save even more: ($o)=@_;. liverpole points out that $o=pop; is shorter still.
  5. Rearrange while and for loops in postfix notation where possible (one statement per iteration), as I've demonstrated and whio explicitly state, the postfix notation removes the need for the parens and braces, and while's are generally spelled better as for:
    for(3..6){u($_)}
    becomes:
    u($_)for 3..6;
  6. Taking advantage of default expressions for functions (generally $_), remembering other defaults for operators and other perl idioms (<>, for example), along with the above hint turns this:
    while(<STDIN>){$t.=lc$_}
    into this:
    $t.=lc for<>;
    If you need to rotate an array, try it in a subroutine so you can take advantage of @_:
    push @_, shift;
    If it's an array of arrays that you want to rotate:
    push@$_, shift@$_ for @AoA;
  7. 'Inline' subroutine and function calls. Dropping a global $k out of the call and dropping "my" where it isn't necessary turn this:
    my $n = t($_,$k); my %n = f($n);
    into:
    %n = f(t($_));
    Define sub t and sub f ahead of time, and you're left with:
    %n = f t$_;
  8. When you can't postfix a for loop, try to use map in void context. This, combined with dropping globals, dropping lexical scoping, and adding inline functions, defining sub t, sub b, and sub i ahead of time (along with rearranging the inputs to sub b takes something like this:
    for(0..($k-1)){ my $n = t($_,$k); my %n = f($n); my @g = b(1,\%n); $y .= i(\@g,$l,\%t) }
    and turns it into this:
    map{ %n = f t$_; @g = b\%n,1; $y .= i\@g } 0..$k-1;
  9. Drop temporary variables where possible. Cast function/subroutine return types. Combining this idea with inline functions in a simple example:
    @array = routine1($param1,$param2); $result = join ':', @array;
    becomes:
    $result = join ':', @{routine1($param1,$param2)}
    Combining this idea with using map in a void context, inline functions, taking two suggestions from whio to eliminate an extra character in a comparison (turning >= into <), and the reminder that $x =~ /./g is shorter than split //, $x, and rearranging a few operands takes this code:
    for((split//,$t)){ my $l = (split //,$y)[$c]; my $s = join '', @$l; my $p = index($s,$_); $o .= $p >= 0 ? $a[$p] : $_; $c += $c < $k-1 ? 1 : -$k+1 }
    and turns it into:
    map{ $p = index((join '', @{($y=~/./g)[$c]} ), $_); $o .= $p < 0 ? $_ : $a[$p]; $c += $c < $k-1 ? 1 : 1-$k } $t =~ /./gs;
  10. Drop "if" checks when iterating when possible. Don't forget the ||= operator. Combining this with postfix for loop notation:
    for(a..z){ if( ! $d{$_} ){ $d{$_} = 0 } }
    becomes:
    # no, you don't need a space between the 0 and the for. $d{$_} ||= 0for a..z;
  11. Remember how your data is used. In the original fuller version, I was formatting the return value for display. In my golfed version, I no longer display intermediary values, so I can drop the formatting:
    my @k = map { $_->[1] } sort { $b->[0] <=> $a->[0] } @c; my $r = $k[0]; $r =~ s/([a-z])\s\(.*/$1/; return $r
    with dropping the intermediary variable @k, becomes:
    return( map{ $_->[1] } sort { $b->[0] <=> $a->[0] } @c )[0]

More examples:

  1. Taking advantage of subroutine parameter initialization to get extra lexical variables, postfix for notation, new globals, dropping return:
    sub b{ my( $e, $l ) = @_; my @g; for(sort keys %$l){ push @g, [ $_, '=', (split //,'#' x int($$l{$_} * $e))] } return @g }
    becomes:
    sub b { my( $e, $l, @g) = @_; push @g, [ $_, (split //,'#' x int($$l{$_} * $e))] for sort keys %$l +; @g }
  2. Populating a hash with a default values for all desired keys if the array passed in doesn't already contain them:
    sub f{ my %d; $d{$_}++ for grep /[a-z]/, split //, shift; for(a..z){ if( ! $d{$_}){ $d{$_} = 0 } } return %d }
    becomes:
    sub f{ my %d; $d{$_}++for grep /[a-z]/, shift=~/./g; $d{$_} ||= 0for a..z; %d }
  3. Using globals, taking advantage of subroutine parameter initialization, using map in void context instead of a for loop (twice), dropping temporary variables, ignoring display formatting, dropping return:
    sub i{ my ($g,$l,$t) = @_; my @c; for(0..25){ my $v = v($g,$l,$t); push @c, o($v,$$g[0][0]); w($g) } my @k = map{ $_->[1] } sort{ $b->[0]<=>$a->[0] } @c; my $r = $k[0]; $r =~ s/([a-z])\s\(.*/$1/; return $r }
    becomes:
    sub i{ my ($g,@c) = @_; map{ push @c, o(v($g), $$g[0][0]); w($g) } 0..25; ( map{ $_->[1] } sort { $b->[0] <=> $a->[0] } @c )[0] }
  4. Using a global (instead of passing a hashref), inlining functions, defining subroutines ahead of time:
    sub k{ my $s = shift; my @g; for( sort{ $$s{$b} <=> $$s{$a} || $a cmp $b } keys %$s ){ last if $$s{$_} < 3; next unless $_ =~ y/a-z// > 2; my @f; push @f, ( pos($t)-3 ) while $t =~ /$_/g; my $g = c(n(@f)); $g > 2 && push @g, $g } return c(@g) }
    becomes:
    sub k{ my @g; for( sort{ $s{$b} <=> $s{$a} } keys %s ){ last if $s{$_} < 3; next unless y/a-z// > 2; my @f; push @f, (pos($t)-3) while $t =~ /$_/g; my $g = c n@f; $g > 2 && push @g, $g } c@g }
  5. Rolling two for loops into maps shaved a few characters, and I let perl auto-vivify $c as a numeric scalar in context:
    sub o{ my ($g,$w) = @_; my $c = 0; for( @$g ){ for( @$_ ){ /\+/ && $c++; /\-/ && $c-- } } return [$c,$w] }
    becomes:
    sub o{ my ($g,$w,$c) = @_; map{ map{ /\+/ && $c++; /\-/ && $c--} @$_ } @$g; [$c,$w] }
  6. Using a global, rolling a for loop into a map, convert split's to /./g:
    sub t{ my ($o,$k) = @_; my $c = 0; my $r; for(split //,$t){ $r .= $_ unless(($c+($k-$o)) % $k); $c++ } $r =~ s/[^a-z]//g; return $r }
    becomes:
    sub t{ my ($o) = @_; my $c = 0; my $r; map{ $r .= $_ unless($k-$o+$c) % $k; $c++ } $t=~/./gs; $r =~ s/[^a-z]//g; $r }
  7. Using globals, skipping a my declaration (I know the sub only be called with a single, valid parameter, so $s will always be unique), rolling two for loops in maps in void context, skipping a long if(){}elsif(){} by using the ternary ?: with the same net effect, defining subs ahead of time:
    sub v { my ($m,$l,$t) = @_; my @g = b($l,$t); my $s = \@g; my $z = 0; for( @$m ){ my $x = 0; for( @$_ ){ if( $$m[$z][$x] eq '#' && $$s[$z][$x] eq '#' ){ $$s[$z][$x] = '+' } elsif( $$m[$z][$x] eq '#'&&$$s[$z][$x] ne '#' ){ $$s[$z][$x] = '-' } $x++ } $z++ } return $s }
    becomes:
    sub v{ $m = pop; my @g = b\%t,$*; $s = \@g; $z = 0; map{ $x=0; map{ $$s[$z][$x] = $$m[$z][$x] eq '#' && $$s[$z][$x] eq '#' ? '+ +' : '-'; $x++ } @$_; $z++ } @$m; $s }

Well, those are all the tips and tricks I have used in a recent obfuscation, where I shaved about 200 characters off of 1785 or so. Could it have been golfed further? Probably. Let me know if there's anything I've missed!

Updated: incorporated some suggestions from Limbic~Region, liverpole, and whio.



--chargrill
$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}

Comment on Perl Golf 101
Select or Download Code
Re: Perl Golf 101
by blazar (Canon) on May 26, 2006 at 14:57 UTC
    This isn't meant as an exhaustive treatise on golfing, there are others much better at it than I. However, I was unable to find a good "howto" anywhere, and though I'd make a little one here. It's my hope that others better at this sport might share some tidbits of their own.

    There used to be a nice and very extensive reference at http://terje2.perlgolf.org/~golf-info/Book.html, but the domain seems to have expired. I should have a local copy at home.

      Yes, in fact I specifically recall that exact URL when I was looking for hints and tips, but couldn't even convince google to spit out a cache. If you could post your local copy somewhere, that would be awesome, as a relative newcomer to golf, I know I'm already starting at a handicap. :)



      --chargrill
      $,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}
        There might be another site that caches that sort of stuff - I'm trying to remember the URL tho. It's kind of an archive of pages/changes to sites over the years. If I come up with the site I'll post it and let you know (might be in my notes from a class I took a few years ago).

        Update: You might look at this site, It's an archive of internet sites over the years.

      < mtve> we with km forgot to pay for it in time, then it was freed and now it's taken away by those mo***s.

      From a conversation at #perlgolf, talking about perlgolf.org yesterday.

Re: Perl Golf 101
by liverpole (Monsignor) on May 26, 2006 at 15:01 UTC
    A very nice guide!  chargrill++.

    One note, don't forget that the return of a subroutine is the last statement of the subroutine.  So instead of:

    sub o{ my ($g,$w) = @_; my $c = 0; map{ map{ /\+/ && $c++; /\-/ && $c--} @$_ } @$g; return [$c,$w] }
    You can just do:
    sub o{ my ($g,$w) = @_; my $c = 0; map{ map{ /\+/ && $c++; /\-/ && $c--} @$_ } @$g; [$c,$w] }

    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

      You know, the funny thing is that I know that, and that was one of the first changes I made in going from my full, fairly properly coded version to the golfed version. However, I specifically recall that there was at least one of my subs that refused to function correctly after the change. Was it from removing the explicit return, or some other change I made? I don't know, but I do know that restoring the return (and probably some other subtle changes) restored the functionality, so from that point on, dropping the explicit return was left out of my golf bag.

      Update: Ok, it must've been some other subtle change that borked my golfed version, because without any other change, I removed all return's and it worked exactly as designed.

      Then again, I don't recall making the claim of presenting anything beyond perl golf 101 :)



      --chargrill
      $,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}

        actually a few months ago I remember a discussion on p5p on return. with no explicit return and with control structures it's actually kind of a lottery to guess the actual *return*

        the sad thing was that it was no easy to fix... and if I recall correctly PBP says always use return ...orthogonal to golf

Re: Perl Golf 101
by whio (Sexton) on May 27, 2006 at 01:32 UTC
    A lot of times, golf is about shaving off a few characters here and there. I'm sure these have been pointed out before, but:
    • y///c is one character shorter than length
    • ($x=~/./g) is two characters shorter than (split//,$x)
    • bareword=> can save a character over 'bareword',
    For control flow, you want for, map, and punctuation. map and for are largely interchangeable. while is almost never necessary. Don't forget about the C-language-style for(;;), although it's rarely a gain. Postfix your operators; statement-modifier for doesn't require the parens or the braces that make a BLOCK. Replace branching if() statements with ternary or logical operators. Precedence rules will help here; also, remember that in Perl, true is '1' and false is the empty string. If you don't need the short-circuiting, you can use the bitwise flavor of & and |. Know which builtin variables are used and what special effects they have. ($- and $=, I'm looking at you.)

    From the snippets you have posted:

    $o .= $p >= 0 ? $a[$p] : $_;
    could become $o .= $p < 0 ? $_ : $a[$p] and
    $r =~ s/[^a-z]//g;
    could become $r =~ y/a-z//cd;

      Excellent tips! I'll have to incorporate those into my golfed obfu, though I shan't post an updated version because its format is quite well fixed with the existing number of characters it had prior to the application of some of the tips received since posting. Since the first round of feedback from this node, I've already cut down another ~60 characters, and with yours I'm sure I'll cut down quite a few as well (Update: around another 20 so far), as the original makes a fair amount of use of some of the idioms you're suggesting which can be golfed further.

      I sort of hinted at postfixing for not requiring the parens of braces without stating it explicitly, along with order of precedence.

      As far as builtin variables, while I know what $- and $= are (and default to), I don't have anything in my notes about special effects they have - just a note for $* making any numerical value assigned to it an implicit int (which gives me an idea... ;-)

      I will more than likely incorporate some of your suggestions (and especially your explanations!) into my original post, with of course your permission (and proper attribution on my part).



      --chargrill
      $,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}
        As you point out, $* converts numerical values to int. $= is similar except that it converts everything to int. $- is the same as $= except it can't go negative, or higher than INT_MAX. ($= seems to be able to go up to UINT_MAX.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://551876]
Approved by Limbic~Region
Front-paged by liverpole
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2014-07-13 06:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (247 votes), past polls