Re: Is too little too much? Coding under the microscope...

Replies are listed 'Best First'.
Re: Re: Is too little too much? Coding under the microscope... by snafu (Chaplain) on Jun 28, 2001 at 09:47 UTC
Hehe, funny you should ask. That is exactly the part I am having problems with. Here is the purpose of the block. I am trying to get the newest logfile in the directory. So, the idea is to read the directory, sort (newest to oldest) the array based on the age of the logs (hence the sort()) without losing my filenames (since those are what I need in the end) and then pop()'ing off the newest filename based on its age (which should have been sorted now and at the top of the array). Am I on the right track? :) ---------- - Jim	[reply]
Re: Re: Re: Is too little too much? Coding under the microscope... by ChemBoy (Priest) on Jun 28, 2001 at 10:28 UTC
You are on the right track, but missing a couple of critical steps, which leaves you throwing out data that you don't want to throw out. I'll walk through the first version you proposed, because I like it better (I personally find that style easier to read than the other one you mentioned, and it's also closer to being right): `opendir(LOGS,"."); sort @val = map { -M $_ } grep(/.\.log/,readdir(LOGS));` [download] Problem one: sort returns a (sorted) list, which you're discarding. So if* everything else were right (we'll get there), you would want `@val = sort map {-M $_} grep {...` [download] To see the bigger problem, we break it down a bit more, working from right to left down that line: `@tmp1 = readdir LOGS; # @tmp now contains the files listed in "." @tmp2 = grep {/.\.log/} @tmp1; #@ tmp2 now contains files that contain ".log" @tmp3 = map {-M $_} @tmp2; #@tmp3 now contains what? @val = sort @tmp3;` [download] If you re-read map, you'll discover that the list it generates is simply the result of the block {-M $_} applied to each value in @tmp2: since -M returns the age of the file, you now have a list of the ages of the log files in the directory, but not* their names. Sort that list, and you have a sorted list of ages, but with no names associated with them. This is not what you wanted. :-) What you do want is to associate age information with each of the items in @tmp1, sort on that, and get back the sorted version of @tmp1, which can be done thus: `use strict; #you are using strict, aren't you? ;) my @val = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, -M $_] } grep { /\.log$/ } readdir LOGS;` [download] (assuming I didn't drop any critical concepts in there). You may note that I implemented chromatic's suggestion (well, almost) and tagged a $ into the regex to match the end of the file name (so "foo.logger" doesn't match). This is what's known as a Schwartzian Transform, a term you may or may not have encountered here before. It is very useful. It is also slightly strange to the naked eye--if it makes instant sense to you, congratulations! And if not, take some time to work it out (there are a couple good explanations attached to that link) , because it's a cool technique. If God had meant us to fly, he would never have give us the railroads. --Michael Flanders	[reply] [d/l] [select]
Re:{4} Is too little too much? Coding under the microscope... by jeroenes (Priest) on Jun 28, 2001 at 14:21 UTC
Nice post. However, this example is a bit overdone ;-}. You use a Schwartzian trans-/deform logic. That's fine, as it effectively implements pope's cache. However, you can leave it out by doing the -M in sort. I don't know how efficient perl's array/hash is versus the system calls. Let's try it! `#!/usr/bin/perl -w use strict; use Benchmark; opendir LOGS, "."; my @logs = readdir LOGS; my $cmp = timethese( -1, { 'Cached' => sub{ my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, -M $_] } # grep { /\.log$/ } @logs; }, 'Raw' => sub { my @sorted = sort { -M $a <=> -M $b } # grep { /\.log$/ } @logs; } } ); Benchmark::cmpthese( $cmp );` [download] I had to comment out the greps and run it in /usr/bin just to get myself a decent amount of files (1965). Run under linux RH7 (.16 kernel)/ perl5.6/ ext2: Benchmark: running Cached, Raw, each for at least 1 CPU seconds... Cached: 2 wallclock secs ( 0.90 usr + 0.15 sys = 1.05 CPU) @ 18 +.10/s (n=19) Raw: 2 wallclock secs ( 0.45 usr + 0.64 sys = 1.09 CPU) @ 8 +.26/s (n=9) Rate Raw Cached Raw 8.26/s -- -54% Cached 18.1/s 119% -- #that's too little iterations.... Benchmark: timing 1000 iterations of Cached, Raw... Cached: 105 wallclock secs (44.91 usr + 7.74 sys = 52.65 CPU) @ 1 +8.99/s (n=1000) Raw: 258 wallclock secs (47.15 usr + 78.43 sys = 125.58 CPU) @ + 7.96/s (n=1000) Rate Raw Cached Raw 7.96/s -- -58% Cached 19.0/s 139% -- [download] As could be expected, the caching is more efficient, but the raw is only two times slower. All the extra time is spend on system time, the usr (perl) time is fairly indifferent. It might have something to do with the other jobs my box is running (quite memory/ disk demandings ones...) I will update after those are finished. Update: Doesn't matter. Same results on the system idling. I'm curious about other setups ... So if anyone'd like to run them.... Cheers, Jeroen	[reply] [d/l] [select]
Re: Re: Re: Re: Is too little too much? Coding under the microscope... by snafu (Chaplain) on Jun 28, 2001 at 10:51 UTC
I couldn't have asked for a clearer explanation. Honestly, I do not quite grasp the Schwartzian Transform. Then again, I have only heard of it for the first time tonight. I intend to read up on it tomorrow. Chromatic did 'splain a lil to me as well. Your breakdown here answers a ton of questions I had. Thank you very much for patience and wisdom. ---------- - Jim	[reply]
Re: Re: Re: Re: Re: Is too little too much? Coding under the microscope... by Masem (Monsignor) on Jun 28, 2001 at 15:23 UTC
Re: Re: Re: Is too little too much? Coding under the microscope... by pope (Friar) on Jun 28, 2001 at 10:13 UTC
Err, yes the logic is correct. But why don't just write that as simple as this? `$newest = (sort { -M $a <=> -M $b } <.log>)[0];` [download] or if you have large amount of files, you'd better cache the result of -M: `$newest= (sort { ($m{$a} \|\|= -M $a) <=> ($m{$b} \|\|= -M $b) } <.log>)[ +0];` [download]	[reply] [d/l] [select]
Re: Is too little too much? Coding under the microscope... by Abigail (Deacon) on Jun 28, 2001 at 20:57 UTC
Well, if you have a large amount of files and you want to find the newest, the only reasonable optimization of your `sort` is the removal of said `sort`! You don't need more than a single pass (using linear time) to find an extreme. `my $extreme = [365_000, undef]; foreach (<.log>) { $extreme = [-M, $_] if $extreme -> [0] > -M } my $newest = $extreme -> [1];` [download] And if you want to sort them, faster than an Orcish Maneuvre or a Schwartian Transform is the Guttman-Rosler Transform: `@files = map {substr $_ => 18} sort map {sprintf "%017.10f %s" => -M, $_} <.log>;` [download] -- Abigail	[reply] [d/l] [select]


XP is just a number
	PerlMonks