Further to my last post here is an implemetation of the halve the difference method. Uncomment the 3 lines at the start to generate an 8MB test.file. We find our desired reference point in 20 tries (worst case scenario) in a few milliseconds and then dump the rest of the file (3 lines). Assuming you are going to have to work with dates you will need to modify this of course so you can compare if you are before or after your desired start but the principle holds. The total run time should be only a fraction over the time it take to write your output file. Rename it and you are done. You *will* get an infinite loop if your $find_this is not in the file so we abort if $count > $max_tries. With $maxtries set to 100 you are ok for a file with up to
2**100 lines (10**30 in rough terms :-)
my $file = 'c:/test.file';
#open F, ">$file" or $!;
#print F "$_\n" for 1..1000000;
#exit;
my $find_this = 999997;
my $file_size = -s $file;
my $top = 0;
my $bot = $file_size;
my $count = 0;
my $max_tries = 100;
open OLD, $file or die $!;
while (++$count) {
my $middle = int(($top+$bot)/2);
seek OLD, $middle , 0;
my $partial = <OLD>;
my $full_line = <OLD>;
chomp $full_line;
if ($full_line eq $find_this) {
print "Took $count tries\n";
print while <OLD>;
exit;
}
if ($full_line < $find_this) {
$top = $middle;
}
else {
$bot = $middle;
}
die "Ark, baling out of infinite loop" if $count > $max_tries;
}
Let us know how you get on.
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|