Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^4: Fast - Compact That String

by Limbic~Region (Chancellor)
on Feb 10, 2012 at 14:49 UTC ( #953033=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Fast - Compact That String
in thread Fast - Compact That String

mbethke,
Very nice. I had to slightly modify it in order to get the sort order I wanted:

my @c3 = map { my $c = $_; (map { "$c$_" } @c1) } @c1; sub myFromB37 { my $n = shift; my $s = ' '; substr( $s, 4, 2, $c3[$n % 1369] ); $n /= 1369; substr( $s, 2, 2, $c3[$n % 1369] ); $n /= 1369; substr( $s, 0, 2, $c3[$n] ); $s; } my @c4; $c4[unpack 'S', $c3[$_]] = $_ foreach 0 .. 1368; sub myToB37 { 1874161 * $c4[unpack 'S', substr($_[0], 0, 2)] + 1369 * $c4[unpack 'S', substr($_[0], 2, 2)] + $c4[unpack 'S', substr($_[0], 4, 2)]; }
If you have time, I would really appreciate an in-depth explanation.

Cheers - L~R


Comment on Re^4: Fast - Compact That String
Download Code
Re^5: Fast - Compact That String
by mbethke (Hermit) on Feb 10, 2012 at 16:59 UTC

    Yeah, that's what I meant with reversing the bytes: BrowserUK's version generates "little-endian" strings, I just made mine produce the same to more easily check that I'm getting the correct results. To be a proper base37 number they should really have their most significant digit left.

    If you understand BrowserUK's version, mine is a relatively simple extension. fromB37 is probably easiest: first preallocate a string of six characters (faster than to keep appending characters if you know the final length anyway), then take the remainder of the current number divided by 37 and look up the digit to represent this with in @c1. Do this six times and you've converted the number.

    For the reverse conversion, you need a lookup table that has, under the index of a character code, the associated decimal value (ord(' ') => 0, ord('A') => 1, etc.). Then you have to split up the string into characters (the unpack is just a fast alternative to "split //"), and for each characters multiply the old value of $n by 37 and add the current character's "worth".

    What my version does different is just that it processes two characters at a time. @c3 is the cross product @c1x@c1, i.e. every possible 2-character string from ' ' to 'ZZ'. That way I don't have to modulo and divide by 37 six times but only thrice by 37^2. Surprisingly, not using a loop but three separate substr()s was quite a bit faster. Apparently Perl does some clever optimizations there if the string indexes are known at compile time.

    My unpack 'S' ... is just a "wider" version of ord() that works for two characters at a time by converting them to an unsigned short. So I use unpack to convert the 2-char string to an array index, and just like in BrowserUK's version, @c4 gives me the "worth" of that snippet. According to its position in the string I still have to multiply it by 37**4 and 37**2 respectively—for clarity I should probably have used these expressions instead of the "magic" numbers.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://953033]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2014-12-27 09:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls