Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

How smart is 'seek $fh, $pos, 0'?

by Monk::Thomas (Friar)
on May 27, 2015 at 12:36 UTC ( #1127985=perlquestion: print w/replies, xml ) Need Help??
Monk::Thomas has asked for the wisdom of the Perl Monks concerning the following question:

Hello

If the filehandle is at position X and I want to go to position Y: Does 'seek $fh, $pos, 0' rewind to the begin and then skip ahead to $pos or does it optimize automatically and just moves from the current position to the intended position?

In other words:

a) Does it actually make a difference if I calculate the difference and use 'seek $fh, $delta, 1' or if I simply use 'seek $fh, $abs, 0'?

b) Is there a difference between going back (Y<X) and skipping ahead (Y>X)?

Replies are listed 'Best First'.
Re: How smart is 'seek $fh, $pos, 0'?
by BrowserUk (Pope) on May 27, 2015 at 13:05 UTC

    Absolute seeks (forward) are (nearly twice) faster than relative seeks from the current position. (Maybe it has to do a tell to find out the current position before relative seeks?)

    Backward seeks from the end are 3 times slower than forward seeks.

    open O, '+<:raw', '1GBx8.bin' or die $!;; seek O, 0, 0; $t=time; seek O, $_*1000, 0 for 0 .. 8589934; print time +-$t;; 7.61572408676147 seek O, 0, 0; $t=time; seek O, 1000, 1 for 0 .. 8589934; print time-$t +;; 11.6447620391846 seek O, 0, 2; $t=time; seek O, -1000, 1 for 0 .. 8589934; print time-$ +t;; 11.6476919651031 seek O, 0, 2; $t=time; seek O, $_*-1000, 2 for 0 .. 8589934; print tim +e-$t;; 23.074695110321

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      I'm getting 7, 8, 5, 12 on cygwin Perl 5.14.4, 25MB file.

      Update: Interestingly, I'm getting 7, 8, 6, 12 with 1GB file, too.

      Update #2: 7, 8, 9, 14 with 10GB.

      Update #3: back at home. My Linux desktop, 24GB file:

      1.68571901321411 1.60926795005798 1.60942387580872 1.69600319862366
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        I'm getting 7, 8, 5, 12 on cygwin Perl 5.14.4, 25MB file.

        I did it on an 8GB file to (attempt) to prevent caching mucking with the numbers.

        And I'm sure that the values will vary depending upon the file (fragmented or not), device(disk/ssd/etc), version of perl, compiler/CRT, but the numbers are pretty consistent across all my devices and have been for several version of perl.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        Update: Interestingly, I'm getting 7, 8, 6, 12 with 1GB file, too.

        Still probably too small to prevent the entire file being cached. (Unless you have less that about 2GB ram?)

        Even so, I can't reason an explanation for why seek backwards relative would be faster than forwards.

        If you have the ability to run that utility that traces system calls (strace?), that might reveal something.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      Win32 strawberry-perl 5.18.2.1 - machine uses SSD 10.0880000591278 11.9660680294037 4.43941283226013 9.69746780395508
      v5.8.8 built for PA-RISC2.0 - older, sloooowwwweeerrr machine, NAS 53.091873884201 57.2255661487579 42.3175899982452 50.95623087883

      --MidLifeXis

        As your relative seek backwards is faster than the absolute seek forwards, I'm guessing that the file you used fitted entirely in cache.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      Wow. Thanks for the benchmark.

Re: How smart is 'seek $fh, $pos, 0'?
by MidLifeXis (Monsignor) on May 27, 2015 at 13:01 UTC

    Doing this from memory - sorry, cannot find a source reference (which would be the ideal authority for your question). Given some of the grey hair now showing on my head, I have to ask - what device do you have open and are calling a seek() against? If it were an old tape drive (perhaps even new ones?) or other sequential access medium, you would need to seek to the beginning of file then to the position. Given a random access medium, you can "simply" (with a bit of additional housekeeping around buffers and such) reset the current file pointer on the handle. So, "It Depends™". [update] Additionally, some devices do not allow seeking backwards and will throw an error if attempted.

    --MidLifeXis

      This question is not backup-related so thankfully I don't have to concern myself with tape drives. The code is expected to handle a specific file format (Bethesda .esm and .esp files) and therefor the filehandle is expected to point to a file on a random access media (disk drive).

      The most common case would be sequentially processing the records, but I'd like to also be able to process the records out of order (e.g. skip uninteresting ones).

Re: How smart is 'seek $fh, $pos, 0'?
by sundialsvc4 (Abbot) on May 27, 2015 at 15:49 UTC

    Very interesting results, BrowserUK, and most certainly unexpected.   Every operating system, that I know of, provides an API-call that is equivalent to fseek().   I presume (but have not verified ...) that the perlguts simply uses that call.   I also have not verified that the underlying OS-implementation (in any particular OS ...) has any sort of dramatically-different execution time for any of the (usually, three) variations of that call.

    Certainly, “seek from the end” would require a little bit of extra work, since one would need to be sure that the current size of the file was atomically and correctly known, even for a shared file.   But I would not expect the OS, nor Perl, to “cache the entire file” in order to determine that!

    Anyhow ... “very interesting.”   I guess you learn something new every day.   Thanks for sharing.

    Now, as for what the OP in this case should do, my instincts would tell me to tell him to “just be perfectly clear.”   To not be overly concerned about milliseconds unless those milliseconds actually matter.   (As I well know that, in your line of work, BrowserUK, they often do ... very much so.)   Use the system-call that most closely matches the way that you would describe your application’s intentions if you were describing it over the water-cooler, and to assume (hope?) that the call is “smart enough for peace work.”

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1127985]
Front-paged by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2017-12-16 09:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (449 votes). Check out past polls.

    Notices?