http://www.perlmonks.org?node_id=286648

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

In the process of patching Win32.c and Win32sck.c to allow me to build perl using Borland C (the runtime of which doesn't support files >4 GB) with USE_LARGE_FILES and USE_PERLIO enabled, I noticed that there are numerous calls to the C-runtime routines fgetpos() and fsetpos(). These routines use a typedef fpos_t to get and set the file pointer position. The problem is that in the normal way of things, fpos_t is a 32-bit value. Obviously, when USE_LARGE_FILES is in effect, this requires a 64-bit value. I've coded a solution to bypass this for Borland, by going directly to the OS using SetFilePointer() and that works fine.

The problem is, when I was looking through the sources looking to see how these two calls where handled for VC++, and failed to find anything that specified that they should use a 64-bit value. The VC++ runtime docs suggest that fpos_t can use a 64-bit value "depending upon the target platform", but I've been unable to work out how this change is effected.

So my question is, how does perl get away with using fgetpos() and fsetpos() on huge files when there is no apparent code to indicate that a 64-bit value should be used? Does the VC++ C-runtime decide to switch to using an fpos_t that is 64-bits automagically? If so, upon what criteria? If not, is this a hole in the Win32/VC++ USE_LARGE_FILE support?

If anyone has an answer to any of these questions, or has a way of checking whether these calls work correctly when used by a perl built using VC++ and USE_LARGE_FILES, I'd be very grateful for their wisdom or assistance. Thanks.

BTW. If anyone else is interested in building perl for Win32 using Borland (a free compiler) send me a message. The required patches are minimal only 3 files are effected.

I'll probably submit a patch for this, but there is no telling how long it might be before it shows up in the build tree, always assuming that it is accepted. Testing so far is fairly minimal, but 99.77% of the test suite passes and the failing tests are unrelated to the modifications that I have made. I'm still trying to track down the causes and would appreciate some help if anyone is interested.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

Replies are listed 'Best First'.
Re: Perl / Win32 / VC++ / USE_LARGE_FILES
by halley (Prior) on Aug 26, 2003 at 13:03 UTC
    From Microsoft's Visual C++ 6.0's <stdio.h>, an excerpt posted for illustrative purposes according to fair use.
    #ifndef _FPOS_T_DEFINED #undef _FPOSOFF #if defined (_POSIX_) typedef long fpos_t; #else /* _POSIX_ */ #if !__STDC__ && _INTEGRAL_MAX_BITS >= 64 typedef __int64 fpos_t; #define _FPOSOFF(fp) ((long)(fp)) #else typedef struct fpos_t { unsigned int lopart; int hipart; } fpos_t; #define _FPOSOFF(fp) ((long)(fp).lopart) #endif #endif /* _POSIX_ */ #define _FPOS_T_DEFINED #endif
    The fpos_t type's implementation chooses between three varieties, depending on ambient integer size options, where both non-POSIX varieties offer 64 bit signed lengths. I would check the command-line compiler switches to find any switch that controls (even indirectly) the integral word length.

    --
    [ e d @ h a l l e y . c c ]

      Thanks halley++.

      That confirms my conclusions drawn from the limited discussion I could find on the matter at MSDN.

      The problem I still have is that AS 802 is built for 32-bit integers but supports huge files...

      perl58 -V --- snip --- hint=recommended, useposix=true, d_sigaction=undef usethreads=undef use5005threads=undef useithreads=define usemultiplici +ty=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef ---snip---

      As you can see, use64bitint and use64bitall are turned off, but uselargefiles is turned on. I can conform that 802 does handle files up to 10 GB.

      The question comes, how? Given 32-bit "ambient integers", and no discernable other mechanism for indicating that fpos_t should be a 64-bit value, how does the compiler decide to switch to using a 64-bit fpos_t with fgtpos and fsetpos?

      I guess it is possible that perl never uses fgetpos() and fsetpos() to satisfy calls to tell(), seek(), and stat(), and so the problem doesn't arise? I have to say though that from what I can make out from the sources, this is not the case. However, I'm not a compiler and trying to keep track of all the conditional compilation through myriad header files and source files is a job best left to a compiler:)

      I guess it could be that fgetpos() and fsetpos() only ever get used by perl for its own file handling and not when doing IO on behalf of the user programs. This could mean that if I tried to run a perl script of > 4GB it might not be able to find the __DATA__ section.

      Big deal:) I guess we'll cross that bridge when someone reports a problem with their 4 GB script. I just hope they remember to use <readmore> tags :)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Perl / Win32 / VC++ / USE_LARGE_FILES
by Courage (Parson) on Aug 26, 2003 at 13:23 UTC
    The problem with Borland C++ is simple: it does not support 64 bit file access functions, and this is just that.

    I've asked this on p5p list, and Jarkko answered:
     "Well, if Borland doesn't have the APIs, we cannot fix that.."

    Indeed, any of _fstati64, _lseeki64, _telli64 are missing from *.lib

    To summarize:
       there's nothing to worry about, nothing at all,
       there's nothing to worry about nothing I can do.

    To answer your question how does MSVC does that: it has such a functions. Those could be reimplemented using Win32 API, but this should belong to CRT, not to Perl.

    And, BrowserUK, having such enthusiasm, may be it worth posting your worries to p5p *and* posting bug to Borland?
    Now posting bug report to Borland is possible, but requires tonns of registrations and last time I tried Web interface did not accepted my password (although another their system accepted my password beautifully)

    I hope you'll have better luck than mine in this :)
    Besides, what namely you are trying to achieve by trying Borland build?

    Best regards,
    Courage, the Cowardly Dog

      That is exactly why I brought this up here rather than on p5p.

      From the p5p perspective, Borland doesn't support huge files - so don't use that. Trouble is, the seperation of the build-time options isn't good enough, and even trying to build with PERLIO and not LARGE_FILES, breaks, because one or more of the three calls you mention is used by PERLIO!

      From Borland's perspective, the compiler is free and doesn't support huge files. If you want that then buy their professional compiler suite.

      That's the correct view from their (p5p and Borland) point of view, but I still want to build perl with Borland. I don't have, and have no wish to purchase VC++! Given that the resultant code is going to run on the same target, the OS obviously has the capability and given that PERLIO already isolates all the CRT IO components into win32_* wrappers, it becomes a fairly trivial matter to bypass the CRT for those three calls and go direct to the OS. This is what I have done, and it's working great!

      The point of this post is that in addition to the various varients of tell(), seek() and stat() that must be transmuted to their 64-bit equivalents. The perl sources also use fgetpos() and fsetpos(). They use the definition of fpos_t which by default is 32-bit. It was when I went looking to see how these calls handle the transition to using a __i64 when using VC++ (so as to get a clue as to the best way for me to handle the same under Borland) that I came up empty.

      The VC++ documentation (and now confirmed by halley above. Thanks!) indicates that the selection of which definition of fpos_t is used is determine by the target platform. If your building for a 64-bit platform, then the natural integer size is 64-bits and so that gets used. The problem is, the majority of Win32 platforms are 32-bit, but they still support huge files. The question that arose was how do you pursuade the VC++ compiler to select 64-bit semantics for fgetpos and fsetpos whilst building for a target where the natural integer size is still 32-bit.

      This question is still unanswered! If I am right, then it is at possible that when building perl using VC++ using USE_LARGE_FILES and/or USE_PERLIO, that these two calls are still using the 32-bit definition of fpos_t. If that is the case, given the number of uses of fgetpos/fsetpos that appear in the sources, including it's use in the implementation of one or more of the _*i64() routines you mentioned above, then it would appear that there is a hole in perl's support for huge files even when using VC++. I have patched these two calls also for use with Borland, and I now have a fully specified, working version of perl built with Borland that correctly handles tell(), seek(), stat() (and -s) on files up to at least 10 GB, so I am a happy bunny.

      However, if this was a bug with VC++, then that would be a fairly major bug and would be very much a matter for the attention of p5p. It is to this end that I have posted this here in the hope of verifying my suspicions with the help of other monks who have VC++ (& experience) before sending the very busy p5p guys on a wild goose chase.

      That said, I have used AS 802 (built with VC++) to test this and it also handles my 10 GB file correctly. The only question I am now trying to clarify is how does it do this.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        Dear BrowserUK,

        Your answer proves that you exactly should bring this up on p5p rather than here :)

        Of course it is not good that 5.8.1-pre do not compiles now with Borland C++ without touching source code tree (need some *.h patching), yet 5.8.0 builds with Borland fine -- no USE_LARGE_FILES, but source tree builds fine. Hence incompatibility was introduced somewhere in between. At least this needs immediate patches in p5p, may be you'll be in time before RC5 snapshot.
        And 5.10.0-to-be also needs to be updated.

        It is wrong to state that p5p is not aware of Borland compiler, it is really supported, but sometimes there is no Borland wisdom handy. You're very welcomed to add your efforts. Honestly.

        I also must disagree with your "From Borland's perspective, the compiler is free and doesn't support huge files. If you want that then buy their professional compiler suite". Our department payed for professinal compiler suite, as of Borland C++ Builder 4, and it uses *exactly same compiler* as free compiler suite. If you buy commercial suite, you pay for IDE, VCL (another Borland CRT, but also lacking _*i64()). It has absolutely same and interchangeable compiler inside!

        Now I'll do a humble attempt to answer your question, but don't expect much from stupid dog :)

        You see many wrapper functions in ./win32/ sources. You don't even realize which of those are used and wrapped to what. The only I can say to you right now with probability of 99% is that fgetpos function from win32.c is *not* used, instead there is a forest of wrappers and ifdefs, and it is far from obvious what namely is called, look at occurences of fgetpos in miscellaneous files, especially this from perl.h:

        #ifdef USE_64_BIT_STDIO .... # if defined(USE_FGETPOS64) # define fgetpos fgetpos64 # endif and #ifdef USE_64_BIT_STDIO # ifdef HAS_FPOS64_T # undef Fpos_t # define Fpos_t fpos64_t # endif yet in config_H.vc contains #define Fpos_t fpos_t /* File position type */
        So underlying API finally will be called with fpos64_t which is no longer 32 bit.

        Anyhow, this becomes more and more complicated and I gave up when I looked into this last time, and I just beleived that "all works if API has 64 bit support and do not work otherwise".

        However, still source code needs fixing to revive Borland build. The problem exists because many people propose patches into ./win32 files and they do not check Borland build, so they break it from time to time

        And please do not mystify p5p and go discuss your thoughts there. You have your patches, so go and propose them.
        Even I speak there. Hence anyone could speak there (provied there is something on-topic to say).
        www.perl.com clearly suggests subscribing and discussing on p5p, so why not just go and do this?

        Courage, the Cowardly Dog

      Addendum: I just re-read your post, and realised that I had left one of your questions unanswered.

      Besides, what namely you are trying to achieve by trying Borland build?

      What I am trying to achieve is parity! Parity with all the *nix users who build their own copies of perl using free compilers. I want to have all the same benefits and privileges that they enjoy.

      • I want to be able to use Inline::C, Inline::asm et al.
      • I want to have the ability to build my own modules with XS and C components and not be dependant on others. PodMaster is a great guy, but he is easily pissed off:)
      • I want to be able to use the latest versions of perl as they become available and not wait 6 months for AS to do their thing.
      • I want to be able to apply patches that the p5p guys come up with for the bugs I raise and try them out so that I can do the right thing by their efforts and confirm that their patches work for me.

      Most of all, I think that just because I choose (and others are forced) to use Win32 as my OS, I don't see why my only choices for using perl are:

      1. Use a binary distribution with all of the limitations that implies.
      2. Have to adopt Cygwin in order to build using gcc. If I wanted to use *nix, I would install Linux. Cygwin, from my perspective is the worst of both worlds. Like driving a left-hand drive car in a right-hand drive country, but doing it from the passenger seat.
      3. Put more money into the coffers of MS by buying their compiler. I choose to use Win32, but my copy of NT was paid for a very long time ago. My choice of preferred platform does not translate into a love for that company. I am as aware of their business practices as the next man, and for personal reasons, perhaps more aware than most. I have no desire to add to their wealth or condone their business practices by making new purchases from them, even if I could afford to which is debatable.

      For my purposes, the compiler that Borland kindly make available to the community for free is just fine. It does everything I need it to quite competently with the exception of large file support, which I have patched my way around for perl and will probably get around to fixing at the libraries level at some point.

      I did try using MinGW for a while, but the build process for that seems even more broken than for Borland.

      I will be trying to get my patches accepted back into the perl build tree so that others might also benefit (if only I can find a version of diff that can handle diff'ing two 5000+ line files without forcing my system into swapping by consuming 200+ MB of memory before I got fed up and killed it :().


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        Have to adopt Cygwin in order to build using gcc.
        ...

        I did try using MinGW for a while, but the build process for that seems even more broken than for Borland.

        Actually, perl builds nicely with mingw32. I built the 5.8.0 several times using the mingw-gcc that comes with Dev-C++, when I didn't wanna wait for the ActiveState, and I didn't want to buy VC++ to be able to test embedding. I also built extensions, and I've embedded the Perl interpreter with no trouble at all. All I did was read and follow the instructions (I had done neither anywhere before). However, for some reason it wants to use dmake, which seems to be a make version not compatible with anything else this side of the sun. But using that to build Perl, then nmake for everything else as soon as it is built works just fine.

        However, I have no idea what the take is on huge files in that build. So maybe you can't use it for that reason. And if Borland is fine for you, no reason to switch of course.


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.
        In case you don't know, you can also get the standard edition of the Microsoft VC++ compiler for free if you install the .NET Framework SDK (or even just the redistributable runtime, a 20MB download). It doesn't include the Visual Studio environment, and the optimizations of the professional edition are disabled, but it is a fully functional commandline compiler. The genereated code runs about the same speed as code generated by GCC. It builds Perl just fine.
        I really wish you success.

        I can't be of much help because I can't spend much time with investigating Borland builds, but I will try to be helpful anyway, so feel free to ask.

        (BTW from time to time I use BorlandC++ IDE to catch core dumps and then perform step-by-step debugging of perl code, it appears to be quite comfortable in Borland IDE)

        Also, after reading README.win32 (there is some Borland wisdom there) you will notice that file ./win32/sync_ext.pl should be used when you're dealing with BC++ -- do not neglect it, otherwise you will have some files always recompiled when doing "dmake /target/".

        Courage, the Cowardly Dog