http://www.perlmonks.org?node_id=1127985

Monk::Thomas has asked for the wisdom of the Perl Monks concerning the following question:

Hello

If the filehandle is at position X and I want to go to position Y: Does 'seek $fh, $pos, 0' rewind to the begin and then skip ahead to $pos or does it optimize automatically and just moves from the current position to the intended position?

In other words:

a) Does it actually make a difference if I calculate the difference and use 'seek $fh, $delta, 1' or if I simply use 'seek $fh, $abs, 0'?

b) Is there a difference between going back (Y<X) and skipping ahead (Y>X)?

Replies are listed 'Best First'.
Re: How smart is 'seek $fh, $pos, 0'?
by BrowserUk (Patriarch) on May 27, 2015 at 13:05 UTC

    Absolute seeks (forward) are (nearly twice) faster than relative seeks from the current position. (Maybe it has to do a tell to find out the current position before relative seeks?)

    Backward seeks from the end are 3 times slower than forward seeks.

    open O, '+<:raw', '1GBx8.bin' or die $!;; seek O, 0, 0; $t=time; seek O, $_*1000, 0 for 0 .. 8589934; print time +-$t;; 7.61572408676147 seek O, 0, 0; $t=time; seek O, 1000, 1 for 0 .. 8589934; print time-$t +;; 11.6447620391846 seek O, 0, 2; $t=time; seek O, -1000, 1 for 0 .. 8589934; print time-$ +t;; 11.6476919651031 seek O, 0, 2; $t=time; seek O, $_*-1000, 2 for 0 .. 8589934; print tim +e-$t;; 23.074695110321

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      I'm getting 7, 8, 5, 12 on cygwin Perl 5.14.4, 25MB file.

      Update: Interestingly, I'm getting 7, 8, 6, 12 with 1GB file, too.

      Update #2: 7, 8, 9, 14 with 10GB.

      Update #3: back at home. My Linux desktop, 24GB file:

      1.68571901321411 1.60926795005798 1.60942387580872 1.69600319862366
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        I'm getting 7, 8, 5, 12 on cygwin Perl 5.14.4, 25MB file.

        I did it on an 8GB file to (attempt) to prevent caching mucking with the numbers.

        And I'm sure that the values will vary depending upon the file (fragmented or not), device(disk/ssd/etc), version of perl, compiler/CRT, but the numbers are pretty consistent across all my devices and have been for several version of perl.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        Update: Interestingly, I'm getting 7, 8, 6, 12 with 1GB file, too.

        Still probably too small to prevent the entire file being cached. (Unless you have less that about 2GB ram?)

        Even so, I can't reason an explanation for why seek backwards relative would be faster than forwards.

        If you have the ability to run that utility that traces system calls (strace?), that might reveal something.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      Win32 strawberry-perl 5.18.2.1 - machine uses SSD 10.0880000591278 11.9660680294037 4.43941283226013 9.69746780395508
      v5.8.8 built for PA-RISC2.0 - older, sloooowwwweeerrr machine, NAS 53.091873884201 57.2255661487579 42.3175899982452 50.95623087883

      --MidLifeXis

        As your relative seek backwards is faster than the absolute seek forwards, I'm guessing that the file you used fitted entirely in cache.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      Wow. Thanks for the benchmark.

Re: How smart is 'seek $fh, $pos, 0'?
by MidLifeXis (Monsignor) on May 27, 2015 at 13:01 UTC

    Doing this from memory - sorry, cannot find a source reference (which would be the ideal authority for your question). Given some of the grey hair now showing on my head, I have to ask - what device do you have open and are calling a seek() against? If it were an old tape drive (perhaps even new ones?) or other sequential access medium, you would need to seek to the beginning of file then to the position. Given a random access medium, you can "simply" (with a bit of additional housekeeping around buffers and such) reset the current file pointer on the handle. So, "It Depends™". [update] Additionally, some devices do not allow seeking backwards and will throw an error if attempted.

    --MidLifeXis

      This question is not backup-related so thankfully I don't have to concern myself with tape drives. The code is expected to handle a specific file format (Bethesda .esm and .esp files) and therefor the filehandle is expected to point to a file on a random access media (disk drive).

      The most common case would be sequentially processing the records, but I'd like to also be able to process the records out of order (e.g. skip uninteresting ones).

A reply falls below the community's threshold of quality. You may see it by logging in.