Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

I was recently trying to golf down some code to post as an obfuscation, and in a flash of inspiration happened upon some golfing techniques that I've never been able to find listed in a tidy package, so I thought I'd post my tips and tricks (relatively basic though they are) for anyone on a similar quest to shave characters.

This isn't meant as an exhaustive treatise on golfing, there are others much better at it than I. However, I was unable to find a good "howto" anywhere, and though I'd make a little one here. It's my hope that others better at this sport might share some tidbits of their own.

Note: It's my hope to polish this up and have it moved to the tutorial section. I'll leave this as a readmore until then.

Important: NO tip or technique presented here has ANY reason to show up in production code - golf is fine for fun and games, but can and will cause production code to fail to run properly, fail to be easy to maintain, etc.

  1. Reduce all variable and subroutine names down to single characters. Eliminate as much whitespace as possible. These might be obvious, but it's certainly a starting point. For purposes of readability of this tutorial, whitespace is NOT fully eliminated. And thanks to a reminder from liverpole, the return value of a subroutine is the last statement of the subroutine, so all those pesky returns should be able to be dropped as well.
  2. Do you really need strict compliance? Get rid of variable declarations where possible. Dropping this:
    turns something like this:
    my $k = k(\%s);
    into this:
    $k = k();
    Limbic~Region reminds me that:
    ... is still shorter, and if sub k is defined before it's called, it can be shorter still:
    and this:
    my $l = $d / int(length($t) / $k) / 100;
    into this:
    $l = $d / int(length($t) / $k) / 100;
    liverpole suggests that clever rearrangement of operands yields a few more savings:
    $l = $d / 100 / int length($t)/ $k;
    Also, any numerical value assigned to $* is implicitly an int, and whio suggests that y///c is a character short than length, so now we have:
    $* = $d / $t=~y///c / $k;
    Note: Yes, I did drop dividing by 100. My original code wouldn't work without it, my new golfed code won't work with it. I don't know the exact change that removed its necessity.
    More hints: $_ doesn't need to be declared as a my variable. $* removes the necessity for converting a value to an int, and whio suggests further that $- and $= have similar behavior, except $- can't be negative.
  3. Do you actually need lexically scoped variables? If you do, take as much advantage of any "my" declarations as you can:
    sub i{ my ($g,$l,$t) = @_; my @c; ... }
    1. $l and $t are now globals, so those get dropped, but I still need a lexically scoped $g. Also, the code calling this sub is shortened to also take advantage of the globals.
    2. I still need a lexically scoped array @c, and I know that sub i will never be called with more than one parameter, so I save some characters:
    sub i{ my ($g,@c) = @_; ... }
  4. my($o)=@_; is 1 less character than my$o=shift;. Furthermore, if you know that subroutine will always get called with a valid parameter, you can save even more: ($o)=@_;. liverpole points out that $o=pop; is shorter still.
  5. Rearrange while and for loops in postfix notation where possible (one statement per iteration), as I've demonstrated and whio explicitly state, the postfix notation removes the need for the parens and braces, and while's are generally spelled better as for:
    u($_)for 3..6;
  6. Taking advantage of default expressions for functions (generally $_), remembering other defaults for operators and other perl idioms (<>, for example), along with the above hint turns this:
    into this:
    $t.=lc for<>;
    If you need to rotate an array, try it in a subroutine so you can take advantage of @_:
    push @_, shift;
    If it's an array of arrays that you want to rotate:
    push@$_, shift@$_ for @AoA;
  7. 'Inline' subroutine and function calls. Dropping a global $k out of the call and dropping "my" where it isn't necessary turn this:
    my $n = t($_,$k); my %n = f($n);
    %n = f(t($_));
    Define sub t and sub f ahead of time, and you're left with:
    %n = f t$_;
  8. When you can't postfix a for loop, try to use map in void context. This, combined with dropping globals, dropping lexical scoping, and adding inline functions, defining sub t, sub b, and sub i ahead of time (along with rearranging the inputs to sub b takes something like this:
    for(0..($k-1)){ my $n = t($_,$k); my %n = f($n); my @g = b(1,\%n); $y .= i(\@g,$l,\%t) }
    and turns it into this:
    map{ %n = f t$_; @g = b\%n,1; $y .= i\@g } 0..$k-1;
  9. Drop temporary variables where possible. Cast function/subroutine return types. Combining this idea with inline functions in a simple example:
    @array = routine1($param1,$param2); $result = join ':', @array;
    $result = join ':', @{routine1($param1,$param2)}
    Combining this idea with using map in a void context, inline functions, taking two suggestions from whio to eliminate an extra character in a comparison (turning >= into <), and the reminder that $x =~ /./g is shorter than split //, $x, and rearranging a few operands takes this code:
    for((split//,$t)){ my $l = (split //,$y)[$c]; my $s = join '', @$l; my $p = index($s,$_); $o .= $p >= 0 ? $a[$p] : $_; $c += $c < $k-1 ? 1 : -$k+1 }
    and turns it into:
    map{ $p = index((join '', @{($y=~/./g)[$c]} ), $_); $o .= $p < 0 ? $_ : $a[$p]; $c += $c < $k-1 ? 1 : 1-$k } $t =~ /./gs;
  10. Drop "if" checks when iterating when possible. Don't forget the ||= operator. Combining this with postfix for loop notation:
    for(a..z){ if( ! $d{$_} ){ $d{$_} = 0 } }
    # no, you don't need a space between the 0 and the for. $d{$_} ||= 0for a..z;
  11. Remember how your data is used. In the original fuller version, I was formatting the return value for display. In my golfed version, I no longer display intermediary values, so I can drop the formatting:
    my @k = map { $_->[1] } sort { $b->[0] <=> $a->[0] } @c; my $r = $k[0]; $r =~ s/([a-z])\s\(.*/$1/; return $r
    with dropping the intermediary variable @k, becomes:
    return( map{ $_->[1] } sort { $b->[0] <=> $a->[0] } @c )[0]

More examples:

  1. Taking advantage of subroutine parameter initialization to get extra lexical variables, postfix for notation, new globals, dropping return:
    sub b{ my( $e, $l ) = @_; my @g; for(sort keys %$l){ push @g, [ $_, '=', (split //,'#' x int($$l{$_} * $e))] } return @g }
    sub b { my( $e, $l, @g) = @_; push @g, [ $_, (split //,'#' x int($$l{$_} * $e))] for sort keys %$l +; @g }
  2. Populating a hash with a default values for all desired keys if the array passed in doesn't already contain them:
    sub f{ my %d; $d{$_}++ for grep /[a-z]/, split //, shift; for(a..z){ if( ! $d{$_}){ $d{$_} = 0 } } return %d }
    sub f{ my %d; $d{$_}++for grep /[a-z]/, shift=~/./g; $d{$_} ||= 0for a..z; %d }
  3. Using globals, taking advantage of subroutine parameter initialization, using map in void context instead of a for loop (twice), dropping temporary variables, ignoring display formatting, dropping return:
    sub i{ my ($g,$l,$t) = @_; my @c; for(0..25){ my $v = v($g,$l,$t); push @c, o($v,$$g[0][0]); w($g) } my @k = map{ $_->[1] } sort{ $b->[0]<=>$a->[0] } @c; my $r = $k[0]; $r =~ s/([a-z])\s\(.*/$1/; return $r }
    sub i{ my ($g,@c) = @_; map{ push @c, o(v($g), $$g[0][0]); w($g) } 0..25; ( map{ $_->[1] } sort { $b->[0] <=> $a->[0] } @c )[0] }
  4. Using a global (instead of passing a hashref), inlining functions, defining subroutines ahead of time:
    sub k{ my $s = shift; my @g; for( sort{ $$s{$b} <=> $$s{$a} || $a cmp $b } keys %$s ){ last if $$s{$_} < 3; next unless $_ =~ y/a-z// > 2; my @f; push @f, ( pos($t)-3 ) while $t =~ /$_/g; my $g = c(n(@f)); $g > 2 && push @g, $g } return c(@g) }
    sub k{ my @g; for( sort{ $s{$b} <=> $s{$a} } keys %s ){ last if $s{$_} < 3; next unless y/a-z// > 2; my @f; push @f, (pos($t)-3) while $t =~ /$_/g; my $g = c n@f; $g > 2 && push @g, $g } c@g }
  5. Rolling two for loops into maps shaved a few characters, and I let perl auto-vivify $c as a numeric scalar in context:
    sub o{ my ($g,$w) = @_; my $c = 0; for( @$g ){ for( @$_ ){ /\+/ && $c++; /\-/ && $c-- } } return [$c,$w] }
    sub o{ my ($g,$w,$c) = @_; map{ map{ /\+/ && $c++; /\-/ && $c--} @$_ } @$g; [$c,$w] }
  6. Using a global, rolling a for loop into a map, convert split's to /./g:
    sub t{ my ($o,$k) = @_; my $c = 0; my $r; for(split //,$t){ $r .= $_ unless(($c+($k-$o)) % $k); $c++ } $r =~ s/[^a-z]//g; return $r }
    sub t{ my ($o) = @_; my $c = 0; my $r; map{ $r .= $_ unless($k-$o+$c) % $k; $c++ } $t=~/./gs; $r =~ s/[^a-z]//g; $r }
  7. Using globals, skipping a my declaration (I know the sub only be called with a single, valid parameter, so $s will always be unique), rolling two for loops in maps in void context, skipping a long if(){}elsif(){} by using the ternary ?: with the same net effect, defining subs ahead of time:
    sub v { my ($m,$l,$t) = @_; my @g = b($l,$t); my $s = \@g; my $z = 0; for( @$m ){ my $x = 0; for( @$_ ){ if( $$m[$z][$x] eq '#' && $$s[$z][$x] eq '#' ){ $$s[$z][$x] = '+' } elsif( $$m[$z][$x] eq '#'&&$$s[$z][$x] ne '#' ){ $$s[$z][$x] = '-' } $x++ } $z++ } return $s }
    sub v{ $m = pop; my @g = b\%t,$*; $s = \@g; $z = 0; map{ $x=0; map{ $$s[$z][$x] = $$m[$z][$x] eq '#' && $$s[$z][$x] eq '#' ? '+ +' : '-'; $x++ } @$_; $z++ } @$m; $s }

Well, those are all the tips and tricks I have used in a recent obfuscation, where I shaved about 200 characters off of 1785 or so. Could it have been golfed further? Probably. Let me know if there's anything I've missed!

Updated: incorporated some suggestions from Limbic~Region, liverpole, and whio.

$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}

In reply to Perl Golf 101 by chargrill

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (7)
    As of 2019-11-13 14:45 GMT
    Find Nodes?
      Voting Booth?
      Strict and warnings: which comes first?

      Results (72 votes). Check out past polls.