Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Print Number With Implied Decimal Point

by j355ga (Initiate)
on Apr 18, 2012 at 13:50 UTC ( #965715=perlquestion: print w/ replies, xml ) Need Help??
j355ga has asked for the wisdom of the Perl Monks concerning the following question:

I need to have a very efficient method of placing an implied decimal point after converting a binary field.

The binary file is extracted from a very large database. File could be 500 million rows so extreme efficiency is key. Converting the data from within the database is not an option.

example:

Extracted database file contains a 16 byte binary number equivalent to hex "000000000000d3c21bcecceda0ffff9c" this converts to decimal 999999999999999999999900

The format of this field is decimal(24,2) thus I want to print 9999999999999999999999.00

Substr would work but is tedious and I don't believe printf can insert an implied decimal point. Any other ideas?

Comment on Print Number With Implied Decimal Point
Re: Print Number With Implied Decimal Point
by tobyink (Abbot) on Apr 18, 2012 at 14:17 UTC

    Regular expressions should work.

    Alternatively, try Math::BigFloat.

    use 5.010; use Math::BigFloat; my $number = "999999999999999999999900"; my $decimal_places = 2; Math::BigFloat->precision(-$decimal_places); say Math::BigFloat->new($number)/(10**$decimal_places); __END__ 9999999999999999999999.00

    substr will be faster though.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      substr will be faster though.

      Yes, by several orders of magnitude...

      use Benchmark qw(cmpthese); use Math::BigFloat; Math::BigFloat->precision(-2); my $number = "999999999999999999999900"; cmpthese -1, { substr => sub { my $num=$number; substr($num,-2,0) = '.'; }, BigFloat => sub { my $num=$number; Math::BigFloat->new($num)/100; +}, }; __END__ Rate BigFloat substr BigFloat 3794/s -- -100% substr 1668189/s 43874% --

      And that doesn't even involve accessing or printing out the constructed BigFloat again.

Re: Print Number With Implied Decimal Point
by choroba (Abbot) on Apr 18, 2012 at 14:23 UTC
    I am not sure whether this answers your question or whether the benchmark is correct, but I hope it can help you:
    #!/usr/bin/perl use warnings; use strict; use feature 'say'; use bigint; use Benchmark qw/cmpthese/; my @chars = ('0' .. '9', 'a' .. 'f'); my @list = map {join q(), map $chars[rand @chars], 1 .. 32} 1 .. 100; say for @list; $_ = hex $_ for @list; say for @list; cmpthese(0, { substr => sub { my @l = @list; substr $_, -2, 0, '.' for @l; }, regex => sub { my @l = @list; s/(..)$/.$1/ for @l; } }); __END__ Rate substr regex substr 327/s -- -14% regex 380/s 16% --
    Update: See the replies for more correct benchmarks and better solutions. Thanks, kennethk and Eliya.

      If you want to make the regex solution run much more quickly, use Look Around Assertions in place of the capture and reinsert.

      cmpthese(0, { substr => sub { my @l = @list; substr $_, -2, 0, '.' for @l; }, regex => sub { my @l = @list; s/(..)$/.$1/ for @l; }, regex2 => sub { my @l = @list; s/(?=..)$/./ for @l; }, });
      yields
      Rate substr regex regex2 substr 73.1/s -- -39% -71% regex 121/s 65% -- -52% regex2 250/s 242% 107% --

      Update: As per JavaFan's comment, I had a typo in my sub. Replaced s/(?=..)$/./ with s/(?=..$)/./ which returns the correct result, but at substantially poorer performance.

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        These numbers look too good to be true:
        $_ = "9876543"; s/(?=..)$/./; say; __END__ 9876543
        You cannot actual replace a zero-width assertion.

        And before benchmarking, you should always check whether you're producing the correct results. Bogus solutions noone cares for, even if they're fast.

      use bigint applied globally severely distorts the results.

      As both of your test cases work on strings anyway, you'd get a more useful comparison when restricting the scope of bigint to where it's required.

      With your original version, I get on my machine:

      Rate substr regex substr 481/s -- -14% regex 558/s 16% --

      while with the restricted scope of bigint, I get

      #!/usr/bin/perl use warnings; use strict; use feature 'say'; use Benchmark qw/cmpthese/; my @list; { use bigint; my @chars = ('0' .. '9', 'a' .. 'f'); @list = map {join q(), map $chars[rand @chars], 1 .. 32} 1 .. 100; say for @list; $_ = (hex $_)."" for @list; say for @list; } cmpthese(0, { substr => sub { my @l = @list; substr $_, -2, 0, '.' for @l; }, regex => sub { my @l = @list; s/(..)$/.$1/ for @l; } }); __END__ Rate regex substr regex 5672/s -- -80% substr 28469/s 402% --
        Limiting the bigint scope makes a HUGE difference in my tests. 2,100 rows per second vs 700 per sec. Thanks for the tip!
Re: Print Number With Implied Decimal Point
by JavaFan (Canon) on Apr 18, 2012 at 14:24 UTC
    substr tedious? substr $number, -2, 0, "." does the trick, I don't think that's very tedious. A little faster may be:
    print substr($number, 0, -2), ".", substr($number, -2);
    as that doesn't have to write back a value. But you'll have to benchmark that yourself.
Re: Print Number With Implied Decimal Point
by Anonymous Monk on Apr 18, 2012 at 15:18 UTC
    use bigint; my $n = hex(unpack('H*', "\x00\x00\x00\x00\x00\x00\xd3\xc2\x1b\xce\xcc +\xed\xa0\xff\xff\x9c")); substr $n, -2, 0, '.';

    What's tedious about this?

Re: Print Number With Implied Decimal Point
by j355ga (Initiate) on Apr 19, 2012 at 13:52 UTC
    Thanks to all for the detailed analysis. Looks like substr is the best approach after all!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://965715]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (10)
As of 2014-12-26 11:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (171 votes), past polls