http://www.perlmonks.org?node_id=995190

eval ['#@%??!!'].pack('u69').tr'>-^','a-} \nGT'

-- "dmd" casually utilizes uuencode pack compression in Ruby golf

WTF. Who ordered that? Hmmm ... looks like some sort of Ruby golf compression trick. Wait a minute, Perl has essentially the same pack functionality as Ruby, so ... you should be able to employ this same compression trick in Perl golf too! You should be able to employ this same trick in Perl golf!! How come I'd never heard of this before?

After picking myself up off the floor, I went straight to mtve's famous golf history book, the tome Jasper was so blissfully unaware of throughout the Fonality Christmas Golf Challenge. Curiously, this "pack u" trick appears only twice in the entire book! And only in the obscure "Artistic/Unorthodox" section of TPR(0,4). What gives?

Well, it seems that during TPR(0,4), while everyone was distracted by the protracted and gripping duel between celebrity golfers Ton Hospel and Eugene van der Pijll, two less conspicuous contestants quietly submitted entries employing a new golfing technique: "pack u" compression.

Who was the first golfer to use "pack u" compression?

It seems there were two: Marko Nippula of Finland and Stefan 'Sec' Zehl of Germany.

In the TPR(0,4) "Interlinear Counts" golf competition, played during the first week of June 2002, Marko submitted a 169-stroke "pack u" compressed solution and 'Sec' a 244-stroke one, both well behind Ton's (uncompressed) winning score of 150. It seems that hardly anyone noticed at the time. At least I didn't. Yet Ton "the alien" Hospel did. For in TPR(0,6), Ton introduced a new rule designed to block this new compression technique: "At least 55% of the code must consist of normal ASCII characters".

Though Marko and Sec were unsuccessful with their first "pack u" adventure in this game, their novel compression technique, optimally applied, should be able to shave around ten strokes from a typical 150-stroke Perl golf solution, as we shall see later. That they missed the best way to apply their new stratagem is hardly surprising; it takes time and experience to refine complex golfing techniques.

I've had a go at applying this "pack u" compression technique to both the Saving Time challenge and the classic 99 Bottles of Beer. The previous best Jasper-esque 96-stroke Saving Time solution was reduced by a single stroke (update: three strokes now), while around ten were slurped from the 160 that formed the previously shortest known Perl bottle golf solution, thus demonstrating that the "pack u" club is definitely worth a place in your Perl golf bag.

Further to those two examples, this series forms a general introduction to the difficult topic of compression in golf. Surprisingly, I could not find any previous publications in this field. None. So if you know of any, please respond away. It seems that the few who have mastered this obscure craft have kept it a secret. Shed light on this dark art, this node will.

Should compression be banned in golf?

The new "55% of the code must be normal ASCII" rule introduced in TPR(0,6) effectively outlawed the new "pack u" golfing technique before it had a chance to be refined and improved. Was this a wise decision?

I don't think so because applying this trick effectively is an interesting challenge requiring considerable skill, as we shall see later. Note that both codegolf and shinh's golf site allow "pack u" compression.

Though allowing "pack u" compression, codegolf has banned some other compression techniques, such as Python's built-in compression.

You may be surprised to learn that a high level of skill is required to effectively apply Python's built-in compression, as I discovered recently when I asked leading Python golfer hallvabo about it:

There certainly is a lot of skill in using the built-in compression! First, there is the trade-off between length and compressibility (making the code uniform by re-using builtins, similar loops etc), and then there is the question of avoiding string escapes (\0, \n etc). The second task can be automated by a script trying out different combinations of non-semantic changes (' vs ", \n vs ;, ...) and by randomizing variable names and checking the compressed string for escapes. I was tipped by Mark Byers about the variable-randomization by script trick. ;) He also seems to be a master in the art of writing the source code to be the most compressible, as witnessed by the results in several challenges on golf.shinh.org, for example Evil C Compiler. My no-binary solution is shorter than his, but they compress to the same length!
So I feel that banning Python built-in compression is a bit harsh. Perhaps you could have two categories: one for the shortest non-compressed solution and another for the shortest solution with no restrictions.

Introduction to Perl "pack u" uuencode compression

We'll start with an overview of Perl "pack u" compression so as to make the game solutions, presented later, easier to understand.

As described at Uuencoding (wikipedia), each line in a uuencoded file consists of:

<length character><formatted characters><newline>
The length character indicates the number of bytes encoded on that line; it is an ASCII character formed by adding 32 to the actual byte count. For example, the length byte corresponding to a length of 45 is 32 + 45 = 77, the ASCII code for the letter M.

All lines, except possibly the last one, have the same length, which can be specified via pack's first parameter; for example, pack u51 specifies a length of 51. The default length, corresponding to pack u, is 45.

It is important to note that this length value, 45 for example, is the length of the bytes to be encoded. More useful when playing golf is the length of the output line of encoded bytes, which is the pack length multiplied by 4/3 plus one for the length byte. When playing golf, you often find yourself referring to the following table:

ordpacklenchr
3301!
3402"
350305#
3604$
3705%
380609&
3907'
4008(
410913)
4210*
4311+
441217,
4513-
4614.
471521/
48160
49171
5018252
51193
52204
5321295
54226
55237
5624338
57259
5826:
592737;
6028<
6129=
623041>
6331?
6432@
653345A
6634B
6735C
683649D
6937E
7038F
713953G
7240H
7341I
744257J
7543K
7644L
774561M
7846N
7947O
804865P
8149Q
8250R
835169S
8452T
8553U
865473V
8755W
8856X
895777Y
9058Z
9159[
926081\
9361]
9462^
956385_

From this table, the M line above is the most important because it corresponds to the default length of 45. This table tells us for the default pack length of 45, each line (except possibly the last one) must start with the letter M and must be exactly 61 characters in length.

As for the characters that follow the length byte, the uuencoding algorithm repeats the following for every three bytes:

  • Convert the three bytes to four 6-bit groupings: bits 00-05, 06-11, 12-17, 18-23
  • Evaluate the decimal values (range: 0-63) of each of the four 6-bit groupings
  • Add 32 to each of the four values (range: 32-95)
  • Output the ASCII equivalent of these numbers
If the source is not divisible by three, the last 4-byte section contains padding bytes to make it cleanly divisible.

To illustrate, "Cat" is uuencoded like so:

Input | C | a | t ord | 67 | 97 | 116 Bits | 0 1 0 0 0 0 1 1 | 0 1 1 0 0 0 0 1 | 0 1 1 1 0 1 0 0 6-bit chars | 16 | 54 | 5 | 52 +32 -> ord | 48 | 86 | 37 | 84 Output | 0 | V | % | T

Some Perl examples should further clarify:

$ perl -e 'print pack("u","Cat")' #0V%T $ perl -e 'print pack("u","@\tA")' #0`E! $ perl -e 'print pack("u","C")' !0P`` $ perl -e 'print pack("u","Ca")' "0V$` $ perl -e 'print pack("u",chr(0))' !```` $ perl -e 'print pack("u",chr(1))' !`0`` $ perl -e 'print pack("u",chr(4))' !!``` $ perl -e 'print pack("u","Cat"x15)' M0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T $ perl -e 'print pack("u","Cat"x30)' M0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T M0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T $ perl -e 'print pack("u","Cat"x31)' M0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T M0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T #0V%T $ perl -e 'print pack("u",("Cat"x31)."C")' M0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T M0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T0V%T $0V%T0P``
Some points to note:
  • Perl uses backtick ` (ord 96), not space, for a 6-bit chunk of zero. Backtick is also used for padding bytes at the end when the source string is not divisible by three.
  • If the source string is divisible by 45, all lines start with M; otherwise, the last line uses the number of remaining bytes as the length byte of the last line.
  • If you add just one character to a source string that is divisible by three, the output string increases by four because it is padded with backtick ` characters (see also the "Quantum Golf" section below).

What is the best way to employ "pack u"?

As described in the previous section, "pack u" constrains the packed string to characters in a limited range, namely ord 33-96 (!-`). It is this limited range that gives you a 3/4 compression ratio because four characters are encoded as three.

The price you pay for this compression is that the source Perl program must use characters in the ord 33-96 (!-`) range only. Note that this range excludes newline, space, lower case characters, and ord 123-126 ({ | } ~).

Pioneers Marko and Sec overcame this constraint by employing the tr (aka y) operator as shown below:

# Marko $_=pack+u,'packed-source-string';y/1-Z/7 ^A9-@5Fa-}/;eval # Sec ($_=pack u,q{packed-source-string})=~y/'A-Z`/ a{-}e-i\n\t1 n-yFm/;eval
The "dmd" Ruby program, given in the opening quote of this node, took a similar approach.

Yet all that translation claptrap is awfully long. Can we get rid of it somehow? At first glance, this looks hopeless. After all, lower case characters are not in the !-` range, so your source program could not even use any Perl keywords! However, by utilizing the two-stroke lc function, we can improve on the Marko/Sec approach via:

eval lc pack u,'packed-source-string'
Though way shorter, there is a high price to pay for this: you are now constrained to write the source Perl program without using any space, newline, upper case, or { | } ~ characters. While this constraint can be annoying in the extreme, it is usually well worth it, as we shall see later.

For completeness, here is a list of all possible ways I could think of to employ pack "u" (if I have missed any, please let me know):

eval lc pack u,'' [17] s//lc pack u,''/ee [18] /(?{lc pack u,''})/ [19] print eval lc pack u,''x99 [26] print eval lc pack(u,'')x99 [27] eval lc(j x 165^pack$<^C,'') [28] update: dmd special, s +ee [id://1026287] $_=pack u,'';y/A-^/a-z NGT/;eval [32] (N represents hard new +line) $_=lc pack u,'';y/[-^/ NGT/;eval [32] eval lc pack u,''=~y/[-^/ NGT/r [31] perl 5.12+ s//lc pack u,''x99/ee;y/v_/ N/;print [36] s//lc pack(u,'')x99/ee;y/v_/ N/;print [37] s//lc pack u,''x99/ee;y/[-^/ NGT/;print [39] s//lc pack(u,'')x99/ee;y/[-^/ NGT/;print [40]
The x99 solutions above might be attempted when writing a program that loops 99 times, such as 99 bottles of beer. We'll see some of these alternatives in action in later installments when we tackle 99 bottles of beer. For this introductory node though, we'll focus only on the shortest alternative above.

Instead of using "pack u", you might try "pack unn", where "nn" is a number, as dmd's opening Ruby snippet did. Note that pack "u" is identical to pack "u45". Though dmd's "u69" is outside the valid uuencode range (see table above), it works in Ruby, producing a length byte of "e". That seems to be a Ruby bug that was cleverly exploited by dmd. He wanted a lowercase character and found a way to coax pack to give him one, even though it is not supposed to!

In perl 5.8, going above "u63", as dmd did above, produces a nasty memory read past the end of the (PL_uudmap[]) array -- this perl bug was fixed in perl 5.9.2 by, wait for it ... Ton Hospel! Intriguingly, it is possible to exploit this perl 5.8 memory bug, as indicated by primo below. Exploiting language implementation bugs is an important -- and strangely satisfying -- part of the serious golfer's arsenal.

The Rule of 69

Any Perl golf solution greater than 69 bytes in length can potentially be shortened via uuencode (pack u) compression

To demonstrate this rule, notice that:

eval lc pack u,'packed-source-string'
adds a fixed overhead of 17 bytes. Adding that to the uuencoding compression ratio of three quarters, it is a simple matter to calculate the minimum length of a Perl golf solution that may potentially be shortened by this method:

X17 + X * 3 / 4
6666.5
6968.75
7271

Of course, 69 is the best you could possibly hope for and you would have to be extremely lucky, in practice, to reduce a golf solution even of 90 bytes in length. To understand why, let's try to reduce the previously best known Saving Time solution of 96 bytes.

Compressing Saving Time

You are nothing if not perseverous, which is a word I have just invented

-- Jasper, after I suggested revisiting the Saving Time challenge

As you may recall, Jasper and I formed an "odd couple" team in the original Saving Time Challenge. And it is Jasper's brilliant trigonometric solution that forms the basis for a "pack u" compressed solution because my magic formula solutions all require a (typically 12-byte) magic string containing too many characters outside the ord 33-96 range required by "pack u".

The previous best solution to the Saving Time challenge is 96 strokes (as described in detail at The golf course looks great, my swing feels good, I like my chances (Part V)) as follows:

$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($_^$`%12?o:x)&($_^$'/5?o:'}')for<> +!~/:/..11;print"@$_ "for@c
To prepare this solution for "pack u" compression, we must eliminate any characters outside the ord 33-96 range. Mercifully, there are only three such characters:
  • The } in '}' can be eliminated at the cost of a single stroke simply by replacing '}' with v125.
  • The hard new line in the print statement can easily be replaced with \n, again at the cost of a single stroke.
  • The ~ in <>!~/:/ is harder; we'll come back to that one later.
Making these simple changes gives us a new 98-stroke solution:
$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($_^$`%12?o:x)&($_^$'/5?o:v125)for< +>!~/:/..11;print"@$_\n"for@c

Now, to use the default "pack u", a saving of two strokes over "pack unn", we must start the solution with the letter "m" because that is the required length byte for "pack u". Crudely adding a leading "m" to the above solution is costly though because "m" is the pattern match operator, so instead of adding a leading "m;" (as you could with most other letters) you must resort to something like "mx;" at a cost of three strokes:

mx;$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($_^$`%12?o:x)&($_^$'/5?o:v125)f +or<>!~/:/..11;print"@$_\n"for@c
Can we do better? Yes! Checking perlfunc shows there are very few promising functions starting with the letter "m". Doesn't matter, we only need one lucky function, just one. And map looks perfect because it is usually straightforward to convert a "for" loop to "map". So it proved here, again at the cost of a single stroke:
map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($_^$`%12?o:x)&($_^$'/5?o:v125), +<>!~/:/..11;print"@$_\n"for@c
Up to 99 strokes. Getting closer now.

The final obstacle is to remove that annoying "~" character. An obvious way to do it is:

map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($_^$`%12?o:x)&($_^$'/5?o:v125), +($_=<>,!/:/)..11;print"@$_\n"for@c
Yet that costs a whopping five strokes. Surely we can do better! Yes, we can save two precious strokes via the tactical trick of forming the number 11 by concatenating two ones together (the return values of s//<>/e and /:/) like so:
map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($_^$`%12?o:x)&($_^$'/5?o:v125), +0..s//<>/e./:/;print"@$_\n"for@c
Yay! Unfortunately though, the generated string from this solution turned out to (unluckily) contain a single quote character, meaning, of course, that it cannot be embedded within single quotes. Mercifully, we can reorganize this solution until we find one that (luckily) does not contain a single quote in the generated string:
map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($`%12^$_?o:x)&($_^$'/5?o:v125), +0..s//<>/e./:/;print"@$_\n"for@c
Almost there now! We just need to fiddle with this solution to make it exactly fit the required shape like so:
# 1 2 3 4 5 6 #234567890123456789012345678901234567890123456789012345678901 map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($`%12^$_?o:x)&($'/+5^$ a.$_?o:v125),0..s//<>/e./:/;print"@$_\n"for@c
Because the last line is of length 45, we must start it with the letter "a" (as indicated in the table above). As you can see, we had to waste a few characters to get an exact fit. Re-jigging the code like this feels more like obfu than golf and actually reminded me of when I was fiddling with the code in Saturn to get an exact fit.

Finally, we can generate the compressed solution by running this little program:

my $source = <<'PERSEVEROUS'; map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($`%12^$_?o:x)&($'/+5^$ a.$_?o:v125),0..s//<>/e./:/;print"@$_\n"for@c PERSEVEROUS my $out = unpack 'u',uc($source); print "eval lc pack u,'" . $out . "'";
Thankfully, all this effort proved worthwhile because the generated solution turns out to be 95 strokes in length, one less than the original.

Quantum Golf

If all this damned quantum jumping were really here to stay, I should be sorry I ever got involved with quantum theory.

-- Erwin Schrodinger

Can we reduce this solution further? Maybe (update: yes from 95 to 93). But it will be very difficult because we need to reduce it by four bytes in order to instantaneously quantum jump our golf score down by three.

Compressing 99 Bottles of Beer

The shortest solution presented in Drunk on golf: 99 Bottles of Beer was 160 strokes:

@c=(@b=(++$n,bottle.'s'x@-,of,beer),on,the,wall),s//Take one down and +pass it around, @c. @c, @b. /until/, 99\D+/;print$'."Go to the store and buy some more$&"
I found the compression aspects of those damned beer bottles to be an order of magnitude more complex than for Saving Time. So much so that compressing this bottle golf code warrants its own node, the subject of the next installment in this series. Until then, please feel free to have a go at reducing it using the techniques described above.

References

References Added Later (2020)

Acknowledgement: I'd like to thank mtve and hallvabo for their help in preparing this node.

Replies are listed 'Best First'.
Re: Compression in Golf: Part I
by eyepopslikeamosquito (Archbishop) on Oct 01, 2012 at 02:48 UTC

    I just realized we can reduce my original packed 95-stroker:

    map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($`%12^$_?o:x)&($'/+5^$ a.$_?o:v125),0..s//<>/e./:/;print"@$_\n"for@c
    by two strokes:
    map$c[$_*=.52,5.5-4.7*cos][8+7.4*sin]=($`%12^$_?o:x)&($'/5^$_ ?o:v125),0..s//<>/e./:/;print"@$_\n"for@c#```
    As you can see from the table below, the "?" character, by pure fluke, is two before "A" in the ASCII table:

    ordpacklenchr
    6331?
    6432@
    653345A

    So we can hijack pack's "?" length byte as part of our ternary expression! Note that we padded the source program with #``` at the end to keep its length at 45, the required multiple of four.

Re: Compression in Golf: Part I
by primo (Scribe) on Jul 20, 2013 at 10:12 UTC

    Just for fun, I decided to map out a bit of the region beyond pack's normal range. The results were... interesting.

    ord pack len chr
    064
    11265p
    976689a
    11067n
    10568i
    996993c
    5870:
    3271
    1097297m
    10173e
    10974m
    11175101o
    11476r
    12177y
    3278105
    11979w
    11480r
    9781109a
    1182p
    083
    084113
    085
    086
    087117
    088
    089
    090121
    091
    092
    093125
    094
    095
    6796129C
    9797a
    11098n
    3999133'
    116100t
    32101
    108102137l
    111103o
    99104c
    97105141a
    108106l
    105107i
    122108145z
    101109e
    32110
    116111149t
    104112h
    114113r
    111114153o
    117115u
    103116g
    104117157h
    32118
    97119a
    32120161
    114121r
    101122e
    102123165f
    101124e
    114125r
    101126169e
    110127n
    99128c
    101129173e
    0130
    0131
    0132177

    Besides the panic: memory wrap and Can't localize through a reference, one other thing stands out: pack u78 uses a space as the delimeter. In cases where you can't quite get your code to fit into a u45 format (starting with an m), it's a viable alternative, and has actually been the shortest construction I've been able to find for more than one challenge.

    The run of chr(0) from u83 - u95 is unfortunate, though. An eval'd string beginning with the null character seems to terminate immediately, without error, for whatever reason.