Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: Re (tilly) 1: processing large files

by sierrathedog04 (Hermit)
on Jul 04, 2001 at 18:38 UTC ( #93868=note: print w/replies, xml ) Need Help??

in reply to Re (tilly) 1: processing large files
in thread processing large files

A lot of languages claim to have "no arbitrary size limits" for strings and files or that the limits to the sizes of their thingies is limited only by available memory or hard disk space.

What I have learned from Tilly's post is that we Perl advocates cannot make such a claim. Perl programs can break if confronted with a file size of greater than 2 GB. And since hard disks often come with much more space than that, and since the poster obviously has a need to work with such files, this deficiency is not a trivial one.

"Nothing is difficult for the man who doesn't have to do it himself." All hail the worthy work of the Perl Porters who got us where we are today (with a little help from ActiveState and thus B. Gates.) I intend no disparagement of their magnificent work.
  • Comment on Re: Re (tilly) 1: processing large files

Replies are listed 'Best First'.
Re (tilly) 3: processing large files
by tilly (Archbishop) on Jul 04, 2001 at 21:25 UTC
    I wouldn't take this particular one too badly. When Perl 5 came out it wasn't clear how the industry would handle the 32-bit barrier in file-size, so there was no way to write Perl support for it. You can hardly blame people for not writing support for what didn't yet exist.

    According to Dominic Dunlop Perl had limited support for 64-bit files in 5.005_03, and it is (as noted above) a compile-time option in 5.6. But that compile-time option will not work on all platforms, and not all people on platforms that do support it have used it. And note that support for 64-bit files needs to be present in the operating system. If you are running Linux, that support is first present in 2.4. If you are running FreeBSD it has been there for a few years now.

    Anyways all 32-bit computer applications have arbitrary limits imposed on them by the hardware. And the above question is the leading edge of a trainwreck we will see in slow motion over the next few years. The problem is that if your naming scheme is 32-bits, then it only has about 4 billion names. Waste a bit here or there, and you are limited to 1 or 2 billion. Segment your architecture in some way, and you find that real world limits tend to hit at 1, 2, 3, or 4 GB. Often with a hack (such as large file support or Intel's large RAM support) you can push that off in particular places. But, for instance, Perl on a 32-bit platform will never support manipulating a string of length 3 GB. It isn't going to happen. And Perl is not alone.

    But thanks to Moore's law, it is a question of time before people want to do exactly that. And so as users needs keep on crossing the magic threshold people at first find their workarounds, and then will have to switch to 64 bit platforms. Which won't be pretty, but it will happen. And the trillion dollar question is whose 64-bit chip is going to win. Right now people tend to use alphas. AMD's proposal is (I have heard) technically worse but makes for the easiest upgrade from x86. Intel has a huge amount of marketing muscle. In 5 years the answer will seem obvious in retrospect and everyone else is going to be playing catch-up. And playing catch-up for a very long time - the 128-bit conversion is decades off and there is no guarantee that Moore's law will continue until then.

      The problem is that if your naming scheme is 32-bits, then it only has about 4 billion names.
      I agree with the point that you make, but what is this about "names?" I thought we were discussing the number of available addresses (and thus bytes) in a file, not the number of names.
        I consider a pointer to be the name of a byte in memory.

        Thus my mental model of the problem of seeking to a random spot in a large file is that you don't know how to name the location that you want. YMMV.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://93868]
[Discipulus]: uh uh.. scent of gods..

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (11)
As of 2018-06-22 08:41 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (122 votes). Check out past polls.