Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

MimisIVI, if you mean vec, Tachyon-II pointed you at vec 5 days ago in Re^6: Compress positive integers. It's right there in the last line. Maybe you missed it?

Anyhow, here is Elias Gamma coded in pure Perl:

sub log2{ int( ( log( $_[0] ) + 1e-15 ) / log( 2 ) ) } sub PP_EliasPack { my $packed = ''; my $out = 0; for my $num ( @_ ) { my $len = log2( $num ); $out += $len; vec( $packed, $out++, 1 ) = 1; vec( $packed, $out++, 1 ) = ( $num & ( 1 << $_ ) ) ? 1 : 0 for 0 .. $len - 1; } return $packed; } sub PP_EliasUnpack { my $packed = shift; my $bits = length( $packed ) * 8; my $in = 0; my @unpacked; while( $in < $bits ) { my( $len, $num ) = ( 0 ) x 2; $len++ while $in < $bits && vec( $packed, $in++, 1 ) == 0; last if $in == $bits; vec( $packed, $in++, 1 ) and $num |= ( 1 << $_ ) for 0 .. $len + - 1; $num |= ( 1 << $len ); push @unpacked, $num; } return @unpacked; }

And for reference, here is the results of my benchmark with that included:

C:\test>678848 Run with 15000 unique words in 1000 documents (Ave: 554 words/doc) ASCII uses 4755336 bytes W-BER uses 3196819 bytes Binary uses 3279128 bytes Elias uses 3980063 bytes PP_Elias uses 3980063 bytes 1 trial of Packing ascii (10.203s total) 1 trial of Unpacking ascii (3.159s total) 1 trial of Packing W-BER (18.159s total) 1 trial of Unpacking W-BER (1.516s total) 1 trial of Packing binary (9.910s total) 1 trial of Unpacking binary (1.455s total) 1 trial of Packing Elias (13.613s total) 1 trial of Unpacking Elias (2.739s total) 1 trial of Packing PP_Elias (31.128s total) 1 trial of Unpacking PP_Elias (18.094s total)

Notice that PP_Elias achieves identical compression to Elias in C (as you'd expect :), but that it's twice as slow at packing, and nearly 10 times slower when unpacking.

If you want to continue with using Elias, and can't build the C version yourself, I could let you have it pre-built (for win32), but I have to wonder why you would when W-BER is faster and achieves better compression?

Also, I wonder if you saw Re^8: Byte allign compression in Perl.. where I demonstrated that you can have the DB do the selection for you using the schema I suggested way back when, in 0.312 of a second? For the record, I found a small optimisation in the schema that reduce that by a factor of 10, to 31 milliseconds.

So the DB does the selection, sends you just the data you need to do your proximity calculations, and does it all faster than you could pack a single integer. Interested?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^2: Do you really want to use an array there? by BrowserUk
in thread Do you really want to use an array there? by deprecated

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-20 15:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found