Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

RFC: Compress::LZW

by meredith (Friar)
on Jun 20, 2003 at 00:28 UTC ( #267408=perlquestion: print w/replies, xml ) Need Help??

meredith has asked for the wisdom of the Perl Monks concerning the following question:

I threw this module together using a perl implementation of LZW from Compress::SelfExtracting. As this will be my first published module, I thought I'd RFC it before it goes live. If you're wondering how this module came about, check here -> Any news on Compress::LZW?. Obviously, there are improvements in the works, like using blocking, making a best-guess decision on table size, and auto-detection of table size through magic headers, but this is just the first. :) Currently, using a 16-bit codeword table, the compression croaks at 65,536 codewords. Happy birthday, free LZW!

I believe I want to make this module move toward a format compatible with compress -- currently it is not...
############################################################ package Compress::LZW; require Exporter; use Carp; use vars qw/@ISA @EXPORT $VERSION/; use warnings; use strict; @EXPORT = qw/compress decompress/;; @ISA = qw/Exporter/; $VERSION = 0.01; my (%LZ, %UNLZ); %LZ = (12 => sub { my $v = ''; for my $i (0..$#_) { vec($v, 3*$i, 4) = $_[$i]/256; vec($v, 3*$i+1, 4) = ($_[$i]/16)%16; vec($v, 3*$i+2, 4) = $_[$i]%16; } $v; }, 16 => sub { pack 'S*', @_ }); %UNLZ = (12 => sub { my $code = shift; my @code; my $len = length($code); my $reallen = 2*$len/3; foreach (0..$reallen - 1) { push @code, (vec($code, 3*$_, 4)<<8) | (vec($code, 3*$_+1, 4)<<4) | (vec($code, 3*$_+2, 4)); } @code; }, 16 => sub { unpack 'S*', shift; }); sub compress { my ($str, $bits) = @_; $bits = $bits ? $bits : 16; my $p = ''; my %d = map{(chr $_, $_)} 0..255; my @o = (); my $ncw = 256; for (split '', $str) { if (exists $d{$p.$_}) { $p .= $_; } else { push @o, $d{$p}; $d{$p.$_} = $ncw++; $p = $_; } } push @o, $d{$p}; if ($bits != 16 && $ncw < 1<<12) { $bits = 12; return $LZ{12}->(@o); } elsif ($ncw < 1<<16) { $bits = 16; return $LZ{16}->(@o); } else { croak "Sorry, code-word overflow"; } } sub decompress { my ($str, $bits) = @_; $bits = $bits ? $bits : 16; my %d = (map{($_, chr $_)} 0..255); my $ncw = 256; my $ret = ''; my ($p, @code) = $UNLZ{$bits}->($str); $ret .= $d{$p}; for (@code) { if (exists $d{$_}) { $ret .= $d{$_}; $d{$ncw++} = $d{$p}.substr($d{$_}, 0, 1); } else { my $dp = $d{$p}; unless ($_ == $ncw++) { carp "($_ == $ncw)?! Check your ta +ble size!" }; $ret .= ($d{$_} = $dp.substr($dp, 0, 1)); } $p = $_; } $ret; } ############################################################ 1; __END__ =head1 NAME Compress::LZW -- Pure perl implementation of LZW =head1 SYNOPSIS use Compress::LZW; my $compressed = compress($fatdata); my $fatdata = decompress($compressed); my $smallcompressed = compress($thindata, 12); my $thindata = decompress($smallcompressed, 12); =head1 DESCRIPTION C<Compress::LZW> it a perl implementation of the newly free LZW compression algorithm. It defaults to building a 16-bit codeword table, but provides the ability to choose a 12-bit table also. Depending on the size of your data, the 12-bit table may provide better compression. =head2 Functions =over =item C<compress> Takes a string as its first argument, and returns the compressed result. You can also specify the size of your codeword table in C<@_[1]>, choosing either 12 or 16. 16 is the default. C<compress> will =item C<decompress> Takes a string as its first argument, and returns the decompressed result. You can also specify the size of your codeword table in @_[1], choosing either 12 or 16. 16 is the default. =back =head1 EXPORTS C<Compress::LZW> exports: C<compress> C<decompress> That's all. =head1 SEE ALSO Other Compress::* modules, especially Compress::LZV1, Compress::LZF an +d Compress::Zlib. =head1 AUTHOR Sean O'Rourke, E<lt>seano@cpan.orgE<gt> - Original author, C<Compress: +:SelfExtracting> Matt Howard E<lt>mhoward@hattmoward.orgE<gt> - C<Compress::LZW> Bug reports welcome, patches even more welcome. =head1 COPYRIGHT Copyright (C) 2003 Sean O'Rourke & Matt Howard. All rights reserved, +some wrongs reversed. This module is distributed under the same terms as Perl itself. Let me know if you actually find it useful. =cut

mhoward - at -

Replies are listed 'Best First'.
Re: RFC: Compress::LZW
by dash2 (Hermit) on Jun 20, 2003 at 00:53 UTC
    My god, it's like Open Source 0-day warez! :-)

    This will be great, perfect for packaging perl software to people in a nice, self-extracting file with no library requirements.

    A massive flamewar beneath your chosen depth has not been shown here

      ??? I think your comment was meant for Compress::SelfExtracting, but that would have requirements anyway... namely a perl interpreter. I don't believe it applies the SelfExtracting to any of the libraries anyway. Also, it's okay to publish the code, it's just that you guys can't use it... until tomorrow ;)

      mhoward - at -
        Well actually, I was thinking of doing the self-extracting bit myself - just a perl script with some __DATA__ .... And yes, it's for perl scripts on the server, I didn't think that you could create a binary with it or something. (And I was joking about the warez.)
        A massive flamewar beneath your chosen depth has not been shown here
Re: RFC: Compress::LZW
by diotalevi (Canon) on Jun 20, 2003 at 02:10 UTC

    Please, please don't publish this until it handles .Z files correctly. That's the whole point to having this library (from my perspective).

      I understand your perspective, but there isn't much more to do to get it throwing out .Z's right and left. This is open-source, the idea is that someone else can look at my code and contribute. If someone doesn't heed the version number, the alpha status on CPAN, or the noted incompatibility with .Z files; aren't they asking for trouble? Development, man, development. Sure, someone CAN use the module for some nefarious perpose right now, but why do that when it will get better?

      Anyway, I'm looking into the compress format. Lots to get sorted out though. It will take a little tweaking for the adaptive blocking to work as well as compress.

      I have noted your desire for a compress-compatible format; trust me, I'm working. :)

      mhoward - at -
Re: RFC: Compress::LZW
by Anonymous Monk on Jun 20, 2003 at 00:51 UTC
    Agh! you've doomed us all! You just had to go and post it before the patent runs out... quick somebody change the date on it to tommorrow before 'they' see.

    hehehe. Alright I've had my fun :).
Re: RFC: Compress::LZW
by meredith (Friar) on Jun 20, 2003 at 12:10 UTC
    Before I upload this, I'm going to add a very obvious note that this isn't compatible with .Z files, nor is it currently stabilized in any way.

    Also, I'm going to move this module to an OO interface (with some exported functions for the hardcore). This will provide a more plausible method for adaptive blocking, instead of a program having to feed _all_ of its data at once, we can "stream", just like compress does. OO will also provide for XS (well, more easily:) in the future. Thanks for all the input so far everyone!

    mhoward - at -
      Might be also worth noting that it's only the US patent that has expired - IIRC, the international patent (e.g., the one affecting us elsewhere in the world ;) is in effect for another year yet. While it's unlikely to have any real legal implications, it should be noted.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://267408]
Front-paged by broquaint
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2020-10-21 05:35 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (212 votes). Check out past polls.