Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

4Gb filesize limit

by ruffing (Acolyte)
on Dec 04, 2001 at 03:35 UTC ( #129236=perlquestion: print w/ replies, xml ) Need Help??
ruffing has asked for the wisdom of the Perl Monks concerning the following question:

I've looked around the internet and have been able to find any solutions as to this problem.

Right now, I have a 5Gb file that has thousands of headers in. I have an indexing program that attempts to write a seperate file that contains the seek position for each unique header. The program already has a kludge that allows it to go between the 2Gb-4Gb limit by seeking backwards if the filesize if above 2Gb. This gets me up to 32-bit numbers. But for a 5Gb file I need slightly higher. Suppose for this situation that breaking up the 5Gb file into smaller 2Gb chunks is not an option.

My question is... Is there any way to create a greater than 32bit pointer in Perl to the right place in the file and if there is, will Windows NT/2000 actually be able to call its own API call on a say 35 bit value?

Thanks for the help in advance. Scott

Comment on 4Gb filesize limit
(tye)Re: 4Gb filesize limit
by tye (Cardinal) on Dec 04, 2001 at 03:56 UTC

    See SetFilePointer() in Win32API::File (which comes standard with Win32 Perl).

    Note that using Perl file handles with this may present a problem. You might be able to simply seek before each call to SetFilePointer():

    use Win32API::File qw( SetFilePointer GetOsFHandle ); sub BigTell { my( $fh )= @_; seek( $fh, 0, 1 ); # Flush buffers my $osf= GetOsFHandle( $fh ); my $hi= 0; my $lo= SetFilePointer( $osf, 0, $hi, 1 ); return pack "NN", $hi, $lo; } sub BigSeek { my( $fh, $pos, $whence )= @_; my $osf= GetOsFHandle( $fh ); seek($fh,0,1); # Flush buffers my( $posHi, $posLo )= unpack "NN", $pos; $posLo= SetFilePointer( $osf, $posLo, $posHi, $whence ) or return; return pack "NN", $posHi, $posLo; }
    Note that I chose a format for the "big pointer" such that string comparisons are meaningful.

    Unfortunately I don't have time at the moment to test this. However, if you have problems with it, reply and I'll likely be able to help.

    On the other hand, please reply if this turns out to work well as it'd make a good addition to Win32API::File. (:

    Updated to fix a typo.

            - tye (but my friends call me "Tye")
      I'm still confused, what's the purpose of these two functions? BigTell seems to move the file pointer to the beginning and return 00000000 while BigSeek seeks a file pointer to a specific location based on the $pos you pass in, which I try to use something along the lines of "00010000" for the beginning of the next 4Gb segment. What read call do I use from here to actually read in that segment?

        BigTell seems to move the file pointer to the beginning
        No, in BigTell(), the last arguments both to seek and to SetFilePointer() is 1, not 0, so those calls both seek 0 bytes from the current position; that is, neither call should reset the file position. The call to seek is there for the side effect of flushing read/write buffers and the call to SetFilePointer() is there for the side effect of obtaining the current position, even if it is past the 4GB mark.

        In other words, BigTell() should act just like tell except that it works far past the 4GB mark and so returns the current position as an 8-byte string instead of as an integer.

        And thanks for replying. It got me to test the code and find a typo in BigSeek() [I forgot to rename one instance of $posHigh to $posHi] and to update the original code to fix that.

        I try to use something along the lines of "00010000" for the beginning of the next 4Gb segment.
        Actually it would be pack( "NN", 1, 0 ) which is "\0\0\0\01\0\0\0\0" for the start of the 4GB segment.

        What read call do I use from here to actually read in that segment?
        I was thinking you could just use normal Perl I/O operations. But now that you mention it I guess I do recall hearing that those don't work at least some of the time. I'd think that at least sysread would work, but I've certainly been wrong before and it would take me a while to generate a 5GB file in order to test that. It makes some sense that buffered I/O operations could fail, but I was surprised to hear that just reading a huge file like a stream (even with buffering) fails.

        Though you could certainly use ReadFile() [and WriteFile()] from Win32API::File. Note that if you do need to use those, then you might also need to comment out the calls to seek in both BigSeek() and BigTell() as those might also break in such cases.

                - tye (but my friends call me "Tye")
Re: 4Gb filesize limit
by chromatic (Archbishop) on Dec 04, 2001 at 03:58 UTC
    Since 5.6, Perl can be compiled with large file support. Unfortunately, you'll probably have to be the one to compile it. Type 'perl -V' at the command line and look for "uselargefiles=" in the output. If it's "yes", you're in business. If not, you're in for a compile. :)
      I'm on a win32 machine however and the solution you offered seems to only be valid for unix stations. I downloaded the uncompiled code and only came across that flag in non win32 makefiles

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://129236]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2014-04-18 04:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (461 votes), past polls