Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Is too little too much? Coding under the microscope...

by bikeNomad (Priest)
on Jun 28, 2001 at 09:37 UTC ( [id://92188]=note: print w/replies, xml ) Need Help??


in reply to Is too little too much? Coding under the microscope...

There's clearly a balance between readability and efficiency. It's probably best (if you're working with others, especially) to err in the side of readability until you have to optimize. After all, no one cares how few characters you use.

This isn't to say that more is necessarily better; it can take longer to recognize several lines of code than one idiomatic one, if you know the idiom.

BTW, Just out of curiosity, what's the sort doing? Its return value (the sorted list) isn't being used for anything...

  • Comment on Re: Is too little too much? Coding under the microscope...

Replies are listed 'Best First'.
Re: Re: Is too little too much? Coding under the microscope...
by snafu (Chaplain) on Jun 28, 2001 at 09:47 UTC
    Hehe, funny you should ask. That is exactly the part I am having problems with.

    Here is the purpose of the block. I am trying to get the newest logfile in the directory. So, the idea is to read the directory, sort (newest to oldest) the array based on the age of the logs (hence the sort()) without losing my filenames (since those are what I need in the end) and then pop()'ing off the newest filename based on its age (which should have been sorted now and at the top of the array).
    Am I on the right track? :)

    ----------
    - Jim

      You are on the right track, but missing a couple of critical steps, which leaves you throwing out data that you don't want to throw out.

      I'll walk through the first version you proposed, because I like it better (I personally find that style easier to read than the other one you mentioned, and it's also closer to being right):

      opendir(LOGS,"."); sort @val = map { -M $_ } grep(/.*\.log/,readdir(LOGS));

      Problem one: sort returns a (sorted) list, which you're discarding. So *if* everything else were right (we'll get there), you would want

      @val = sort map {-M $_} grep {...

      To see the bigger problem, we break it down a bit more, working from right to left down that line:

      @tmp1 = readdir LOGS; # @tmp now contains the files listed in "." @tmp2 = grep {/.*\.log/} @tmp1; #@ tmp2 now contains files that contain ".log" @tmp3 = map {-M $_} @tmp2; #@tmp3 now contains what? @val = sort @tmp3;
      If you re-read map, you'll discover that the list it generates is simply the result of the block {-M $_} applied to each value in @tmp2: since -M returns the age of the file, you now have a list of the ages of the log files in the directory, but *not* their names. Sort that list, and you have a sorted list of ages, but with no names associated with them. This is not what you wanted. :-)

      What you *do* want is to associate age information with each of the items in @tmp1, sort on that, and get back the sorted version of @tmp1, which can be done thus:

      use strict; #you are using strict, aren't you? ;) my @val = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, -M $_] } grep { /\.log$/ } readdir LOGS;
      (assuming I didn't drop any critical concepts in there). You may note that I implemented chromatic's suggestion (well, almost) and tagged a $ into the regex to match the end of the file name (so "foo.logger" doesn't match).

      This is what's known as a Schwartzian Transform, a term you may or may not have encountered here before. It is very useful. It is also slightly strange to the naked eye--if it makes instant sense to you, congratulations! And if not, take some time to work it out (there are a couple good explanations attached to that link) , because it's a cool technique.



      If God had meant us to fly, he would *never* have give us the railroads.
          --Michael Flanders

        Nice post. However, this example is a bit overdone ;-}. You use a Schwartzian trans-/deform logic. That's fine, as it effectively implements pope's cache. However, you can leave it out by doing the -M in sort.

        I don't know how efficient perl's array/hash is versus the system calls. Let's try it!

        #!/usr/bin/perl -w use strict; use Benchmark; opendir LOGS, "."; my @logs = readdir LOGS; my $cmp = timethese( -1, { 'Cached' => sub{ my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, -M $_] } # grep { /\.log$/ } @logs; }, 'Raw' => sub { my @sorted = sort { -M $a <=> -M $b } # grep { /\.log$/ } @logs; } } ); Benchmark::cmpthese( $cmp );
        I had to comment out the greps and run it in /usr/bin just to get myself a decent amount of files (1965). Run under linux RH7 (.16 kernel)/ perl5.6/ ext2:
        Benchmark: running Cached, Raw, each for at least 1 CPU seconds... Cached: 2 wallclock secs ( 0.90 usr + 0.15 sys = 1.05 CPU) @ 18 +.10/s (n=19) Raw: 2 wallclock secs ( 0.45 usr + 0.64 sys = 1.09 CPU) @ 8 +.26/s (n=9) Rate Raw Cached Raw 8.26/s -- -54% Cached 18.1/s 119% -- #that's too little iterations.... Benchmark: timing 1000 iterations of Cached, Raw... Cached: 105 wallclock secs (44.91 usr + 7.74 sys = 52.65 CPU) @ 1 +8.99/s (n=1000) Raw: 258 wallclock secs (47.15 usr + 78.43 sys = 125.58 CPU) @ + 7.96/s (n=1000) Rate Raw Cached Raw 7.96/s -- -58% Cached 19.0/s 139% --
        As could be expected, the caching is more efficient, but the raw is only two times slower. All the extra time is spend on system time, the usr (perl) time is fairly indifferent.

        It might have something to do with the other jobs my box is running (quite memory/ disk demandings ones...) I will update after those are finished. Update: Doesn't matter. Same results on the system idling.

        I'm curious about other setups ... So if anyone'd like to run them....

        Cheers,

        Jeroen

        I couldn't have asked for a clearer explanation. Honestly, I do not quite grasp the Schwartzian Transform. Then again, I have only heard of it for the first time tonight. I intend to read up on it tomorrow. Chromatic did 'splain a lil to me as well. Your breakdown here answers a ton of questions I had. Thank you very much for patience and wisdom.

        ----------
        - Jim

      Err, yes the logic is correct. But why don't just write that as simple as this?
      $newest = (sort { -M $a <=> -M $b } <*.log>)[0];
      or if you have large amount of files, you'd better cache the result of -M:
      $newest= (sort { ($m{$a} ||= -M $a) <=> ($m{$b} ||= -M $b) } <*.log>)[ +0];
        Well, if you have a large amount of files and you want to find the newest, the only reasonable optimization of your sort is the removal of said sort! You don't need more than a single pass (using linear time) to find an extreme.
        my $extreme = [365_000, undef]; foreach (<*.log>) { $extreme = [-M, $_] if $extreme -> [0] > -M } my $newest = $extreme -> [1];
        And if you want to sort them, faster than an Orcish Maneuvre or a Schwartian Transform is the Guttman-Rosler Transform:
        @files = map {substr $_ => 18} sort map {sprintf "%017.10f %s" => -M, $_} <*.log>;

        -- Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://92188]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-19 02:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found