Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Database lookup, or Huffman coding

by dave0 (Friar)
on Mar 11, 2005 at 22:46 UTC ( #438836=note: print w/replies, xml ) Need Help??

in reply to Compressing a string, but leaving it as a string

If you're already storing the unique URL in a database, why not use a unique key from the db for the URL portion? That would let you turn
my $string = '/h/plkr/3/';
my $string = '/h/plkr/3/426933';
where 426933 would be the unique ID for retrieving that URI from the database. It has the added bonus of letting the end user tweak things like depth and output format without having to decode/re-encode the string.

If you really want to encode the string as-is, without having to store anything in a DB, you might be able to use something like Algorithm::Huffman in CPAN to generate a short bit vector for your data, and then use some other method (encode_base64 or somesuch) to make it safe for use in a URL. You'll need to grab a good sample of the sort of strings you want to encode, and do a frequency count on them so that Algorithm::Huffman can build up a good dictionary. Here's an untested example:

my $token_counts = { '/' => 10000, '.' => 3000, 'w' => 2342, 'e' => 2000, 't' => 2000, # ... etc, for any character that might appear in the input strin +g }; my $string = '/h/plkr/3/'; my $huff = Algorithm::Huffman->new($token_counts); my $huff_encoded = $huff->encode($string); my $printable = encode_base64($huff_encoded);
I have no idea what sort of size savings this will get you, though. Base-64 encoding costs you one character of output for every 6 bits of input, so as long as your Huffman encoding saves you more bits than Base 64 encoding wastes, you should end up with a URL-safe encoding that's shorter than the input.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://438836]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2021-04-15 07:42 GMT
Find Nodes?
    Voting Booth?

    No recent polls found