Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

code optimization

by arivu198314 (Sexton)
on Nov 03, 2011 at 10:55 UTC ( #935631=perlquestion: print w/ replies, xml ) Need Help??
arivu198314 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I have a input file like
225 2198 374 315 420 1149 57 2611

using the above input, i need to find out "What is the least value while doing the division operation". For example 225/2198=0.102365787, 374/315=1.187301587, from here i need to know what are the two values gives the least value.

I have tried the below code, it is working fine but i need to optimize this code.

open(INP, $ARGV[0]); $minimumAmount=100; while (<INP>) { if (/(\d+) (\d+)/) { if ($minimumAmount > ($1/$2)) { $minimumAmount=($1/$2); $amount=$1; $weight=$2; } } } close(INP); print "$amount\t$weight";

what i'm looking is to increase the speed

if it is possible in unix command itself, it would be best

Comment on code optimization
Select or Download Code
Re: code optimization
by choroba (Abbot) on Nov 03, 2011 at 11:21 UTC
    I do not see a way to make it much faster, but still: do not compute the division several times. Also, if you comment the defined $a and defined $w and part in my code, the sub will run about 5% faster, but you should be sure your input does not contain any invalid lines.
    #!/usr/bin/perl use warnings; use strict; use Benchmark 'cmpthese'; sub arivu { my $input = shift; open my $FH, '<', $input or die; my $minimumAmount = 1e12; my ($amount, $weight); while (<$FH>) { if (/(\d+) (\d+)/) { if ($minimumAmount > ($1/$2)) { $minimumAmount = $1 / $2; $amount = $1; $weight = $2; } } } close $FH; return "$amount\t$weight\n"; } # arivu sub choroba { my $input = shift; open my $FH, '<', $input or die; my $minimumAmount = 1e12; my ($amount, $weight); while (<$FH>) { my ($a, $w) = split; if (defined $a and defined $w and (my $ratio = $a / $w) < $minimumAmount) { $minimumAmount = $ratio; $amount = $a; $weight = $w; } } close $FH; return "$amount\t$weight\n"; } # choroba my $input; $input .= (join ' ', map int(1 + rand 1000), 1 .. 2) . "\n" for 1 .. 1 +000; cmpthese(-1, {arivu => sub {arivu \$input}, choroba => sub {choroba \$input}, }); print arivu \$input; print choroba \$input; __END__ Output: Rate arivu choroba arivu 466/s -- -17% choroba 565/s 21% -- 8 953 8 953
      do not compute the division several times.
      That's debatable. The trade-off is, arivu calculates the division only a second time if it's actually smaller than the previous minimum value. You store the result in a variable *all the time*. Your benchmark is biased in your favour because Perl actually keeps the variable skeleton around for each iteration of the benchmark. You're also using split instead of pattern with backreferences. If both "arivu" and "choroba" use split, the smallest ratio is in the beginning of the list, and the benchmark uses strings instead of subs (to avoid costs being amortized over runs), arivu's solution is actually marginally faster. (On my machine, and by 5% +/- 6%).

      So, what can we conclude?

      • Benchmarking is harder than you think.
      • The real gain is in the split vs backref pattern.
      • Doing a "low cost" operation all the time isn't always faster than doing a slightly "less lower cost" sometimes.
        Interesting. I actually forgot to mention split in my comment, thanks for pointing it up.
Re: code optimization
by spx2 (Chaplain) on Nov 03, 2011 at 11:56 UTC
    Are your numbers bounded in some particular interval ? There could be some nice optimizations if they were..

      No boundaries to the numbers

Re: code optimization
by pvaldes (Chaplain) on Nov 03, 2011 at 18:10 UTC
    I'll enter a while loop, split each line to two numbers, then do the division and push the result to an array, then you can use List::Util qw/min/ on this array...
Re: code optimization
by patcat88 (Deacon) on Nov 03, 2011 at 20:52 UTC
    Spent an hour or 2 optimizing. Used Concise to look at perl assembly. "perl -MO=Concise,-src,-exec,-stash="main" pmonks.pl > t.txt" I always roll my own benchmark. I dont trust benchmarking perl modules. Got rid of nextstates (the op triggers debugger break points and line numbers in caller and die) and padsvs (i guess copies pointers to scalars from pad to stack positions) by compounding operations whenever possible. Got rid of all regular expressions and file handle operations. You can sysread the file yourself. Dont use buffered I/O or regular expressions, very slow. Do list to list assigns less nextstates and overhead. Dont declare/my variables in loop body, perl engine must create and delete them every loop around.
    # 17: $minimumAmount = $1 / $2; 14 <;> nextstate(main 941 pmonks.pl:17) v:*,&,$ 15 <#> gvsv[*1] s 16 <#> gvsv[*2] s 17 <2> divide[$minimumAmount:939,947] sK/TARGMY,2 # 18: $amount = $1; 18 <;> nextstate(main 941 pmonks.pl:18) v:*,&,$ 19 <#> gvsv[*1] s 1a <0> padsv[$amount:940,947] sRM* 1b <2> sassign vKS/2 # 19: $weight = $2; 1c <;> nextstate(main 941 pmonks.pl:19) v:*,&,{,$ 1d <#> gvsv[*2] s 1e <0> padsv[$weight:940,947] sRM* 1f <2> sassign vKS/2
    vs,
    20 <0> pushmark s 21 <0> padsv[$div:961,963] l 22 <0> padsv[$amount:961,963] l 23 <0> padsv[$weight:961,963] l 24 <0> pushmark sRM* 25 <0> padsv[$minimumAmount:961,963] lRM* 26 <0> padsv[$gamount:961,963] lRM* 27 <0> padsv[$gweight:961,963] lRM* 28 <2> aassign[t27] vKS
    also use direct @_ slices, not shift, unless you have variable args or something. Compare
    # 69: my $input = ${$_[0]}; 1 <;> nextstate(main 970 pmonks.pl:69) v:*,&,$ 2 <#> aelemfast[*_] s 3 <1> rv2sv sK/3 4 <0> padsv[$input:970,973] sRM*/LVINTRO 5 <2> sassign vKS/2
    to
    # 59: my $input = ${(shift)}; 1 <;> nextstate(main 965 pmonks.pl:59) v:*,&,$ 2 <#> gv[*_] s 3 <1> rv2av[t3] sKRM/3 4 <1> shift sKP/1 5 <1> rv2sv sK/3 6 <0> padsv[$input:965,968] sRM*/LVINTRO 7 <2> sassign vKS/2
    4 heavy ops, compared to 2 heavy ops.

    For "my( $start, $minimumAmount, $end, $amount, $weight, $div, $gamount, $gweight) = (0, 1e12, length($input));" dont pad right hand list with undefs, those are extra ops. The scalars are undef to begin with.

    Here are my times. badarivu2 is to get rid of a skewing delay in c std lib doing the first console print and maybe some CPU cache preloaded. arivu4 is the fastest. I should add, that this is an excellent candidate for some C code, since we are processing character by character and using pointer arithmetic in Perl. Only optimization I can think of but dont know how to do, is to alias, not copy, $_[0] to $input. Might involve typeglobs, but I dont know how to use them.
    C:\Documents and Settings\Owner\Desktop>perl pmonks.pl badarivu2 0.921875 arivu2 8.9530200958252 arivu3 8.59366011619568 arivu4 8.54677510261536 choroba 22.8123989105225 arivu 13.078125 C:\Documents and Settings\Owner\Desktop>
    Full code.
    #!/usr/bin/perl use warnings; use strict; #use Benchmark 'cmpthese'; use Time::HiRes 'time'; sub arivu { my $input = shift; open my $FH, '<', $input or die; my $minimumAmount = 1e12; my ($amount, $weight); while (<$FH>) { if (/(\d+) (\d+)/) { if ($minimumAmount > ($1/$2)) { $minimumAmount = $1 / $2; $amount = $1; $weight = $2; } } } close $FH; return "$amount\t$weight\n"; } # arivu sub choroba { my $input = shift; open my $FH, '<', $input or die; my $minimumAmount = 1e12; my ($amount, $weight); while (<$FH>) { my ($a, $w) = split; if (defined $a and defined $w and (my $ratio = $a / $w) < $minimumAmount) { $minimumAmount = $ratio; $amount = $a; $weight = $w; } } close $FH; return "$amount\t$weight\n"; } # choroba sub arivu2 { my $input = ${(shift)}; my( $start, $minimumAmount, $end, $amount, $weight, $div, $gamount +, $gweight) = (0, 1e12, length($input)); do{ $amount = substr($input, $start, index($input, ' ', $start)-$s +tart); $start += length($amount) + 1; $weight = substr($input, $start, index($input, "\n", $start)-$ +start); $start += length($weight) + 1; ($minimumAmount, $gamount, $gweight) = ($div, $amount, $weight +) if ($minimumAmount > ($div = $amount/$weight)); }while($start != $end); return "$gamount\t$gweight\n"; } # arivu2 sub arivu3 { my $input = ${(shift)}; my( $start, $minimumAmount, $end, $amount, $weight, $div, $gamount +, $gweight) = (0, 1e12, length($input)); do{ $start += length($amount = substr($input, $start, index($input +, ' ', $start)-$start)) + 1; $start += length($weight = substr($input, $start, index($input +, "\n", $start)-$start)) + 1; ($minimumAmount, $gamount, $gweight) = ($div, $amount, $weight +) if ($minimumAmount > ($div = $amount/$weight)); }while($start != $end); return "$gamount\t$gweight\n"; } # arivu3 sub arivu4 { my $input = ${$_[0]}; my( $start, $minimumAmount, $end, $amount, $weight, $div, $gamount +, $gweight) = (0, 1e12, length($input)); do{ $start += length($amount = substr($input, $start, index($input +, ' ', $start)-$start)) + 1; $start += length($weight = substr($input, $start, index($input +, "\n", $start)-$start)) + 1; ($minimumAmount, $gamount, $gweight) = ($div, $amount, $weight +) if ($minimumAmount > ($div = $amount/$weight)); }while($start != $end); return "$gamount\t$gweight\n"; } # arivu4 my $input; $input .= (join ' ', map int(1 + rand 1000), 1 .. 2) . "\n" for 1 .. 1 +000; #cmpthese(-1, {arivu => sub {arivu \$input}, # choroba => sub {choroba \$input}, # choroba2 => sub {choroba2 \$input}, # }); my $time; $time = time; for(0..500) { arivu2 \$input; }#ignore first time, arivu1 and arivu2 same body, diff times print "badarivu2 ".(time-$time)."\n"; $time = time; for(0..5000) { arivu2 \$input; } print "arivu2 ".(time-$time)."\n"; $time = time; for(0..5000) { arivu3 \$input; } print "arivu3 ".(time-$time)."\n"; $time = time; for(0..5000) { arivu4 \$input; } print "arivu4 ".(time-$time)."\n"; $time = time; for(0..5000) { choroba \$input; } print "choroba ".(time-$time)."\n"; $time = time; for(0..5000) { arivu \$input; } print "arivu ".(time-$time)."\n"; #print arivu \$input; #print choroba \$input; #print choroba2 \$input;

      I always roll my own benchmark. I dont trust benchmarking perl modules.

      Why?

        Profilers can lead to false optimizations because your measuring the overhead of recording the timings (OS interrupts/OS calls/context switches/mutex hits/evals/method resolution) or 1 run if the func is below the OS timer resolution. I took nytprof to these subs and nytprof changed which is the fastest sub which is no good.
Re: code optimization
by gsiems (Chaplain) on Nov 03, 2011 at 21:00 UTC

    For a command line version using clac (http://sourceforge.net/projects/clac/), how about:

    grep -P "\d\s+\d" test.lst | awk '{print $1 "/" $2}' | clac | sort -n +| head -n 1
Re: code optimization
by spx2 (Chaplain) on Nov 04, 2011 at 10:29 UTC

    I was actually looking at this paper which describes in the first 3-4 pages how to compare continued fractions and I was wondering if converting your fractions to continued fractions and then carrying out a continued fractions comparison algorithm(which Flajolet describes in the first link in this post) would lead to faster running times.

    Except.. it's pretty hard to compete with the cost of a division(which is rather low).

    In any case, if Perl doesn't cut it as far as execution time, I would go for XS, or a pure C version.

      Except.. it's pretty hard to compete with the cost of a division(which is rather low).

      Agreed. Almost no matter how this is coded, the runtime is going to be dominated by the time taken to read the data from the file. It's doubtful that you could achieve a meaningful saving even by moving to C.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Yes, I think the C version would however be faster by some factor not depending on the input size. That's because Perl has some overhead because of the data structures it uses, because of the garbage collection and so forth. There's also the silly version of comparing a/b < c/d by multiplying everything with bd then you reach ad<cb. Now, are two multiplications faster than a division, even if the multiplication is carried out with Karatsuba's algorithm (the article says that "Karatsuba is usually faster when the multiplicands are longer than 320640 bits" and also gives complexity) or linear time multiplication ? I'm just wondering what the cost of a normal division is in relation to the cost of two multiplications..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://935631]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2014-12-18 03:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (41 votes), past polls