Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Yeehah!

by claree0 (Hermit)
on Aug 22, 2001 at 17:00 UTC ( #106929=note: print w/ replies, xml ) Need Help??


in reply to Halve the difference
in thread Removing old records from log files

Well, I've made some mods to your sample code, and taken the trim time on my sample file from 2m26s to 0.4 seconds. Wow!

In the code below, I haven't included the subroutine to calculate the epoch-second date of each line 'cos it's longer htan the rest of the file!

Thank you, Tachyon!

#!/usr/local/perl -w use strict; my $file ='current.txt'; my $daystokeep = $ARGV[0]; my $secs_to_keep = $daystokeep * 3600 * 24; my $now = time(); my $earliest = $now - $secs_to_keep; my $file_size = -s $file; my $top = 0; my $bottom = $file_size; my $count = 0; my $max_tries = 100; open (OLD, "$file") or die $!; open (NEW, ">new.txt") or die $!; while (++$count) { my $middle = int (($top + $bottom) / 2); seek OLD, $middle, 0; my $partial = <OLD>; my $full = <OLD>; my $next = <OLD>; if (((linesecs($full)) < $earliest) && ((linesecs($next)) > $e +arliest)) { print NEW $next; print NEW while <OLD>; exit; } if ((linesecs($full)) < $earliest) { $top = $middle; } else { $bottom = $middle; } } close OLD; close NEW;


Comment on Yeehah!
Download Code
Re: Yeehah!
by tachyon (Chancellor) on Aug 22, 2001 at 19:17 UTC

    Wow, 36500% faster. That's a worthwhile saving. Glad it helped. It's always good to use a geometric search rather than a linear one when you have any form of sorted data that you can use the split the dif algoritm on. The number of tries to find the desired position is given by:

    print "Num items Geom avg Lin avg Lin:Geom\n"; for ( my $num_items = 2; $num_items < 2<<20; $num_items <<= 1 ) { # geometric my $geom_max = int(log($num_items)/log(2))+1; my $geom_avg = int(log(($num_items/2))/log(2))+1; # linear my $lin_max = $num_items; my $lin_avg = $num_items/2; printf "%8d %8d %8d %8d\n", $num_items, $geom_avg, $lin_avg, $lin_avg/$geom_avg; }
    Should wrap it in a module one day :-)

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://106929]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-12-20 12:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls