djw has asked for the wisdom of the Perl Monks concerning the following question:

I have written a small application to collect file/path information recursively from a root path given during execution. I collect this information of each file (using File::stat and File::Find):

I store that information in a mysql database. For some reason it is returning a negative value from File::stat's size method on specific files (test system is win32 and the same files are negative when I run the program again). One of the files is a system file (pagefile.sys), but the rest are simply mpeg's. Each file is over 500MB in size. There are other files collected that record the correct size and are 500MB or greater, so it doesn't seem to be a file size problem. There are also other files in the same directories as the problem files and have properly recorded file sizes. There isn't a permission problem on these files.
Here is a quick snippet:
use File::stat qw(:FIELDS ); use File::Find qw( finddepth ); # stuff... finddepth \&gatherData, $dir; # stuff... sub gatherData { if (-f) { stat($_); # ----------- # get the current directory # and file names my $cdir = $File::Find::dir; my $file = $_; # ----------- # put that into our data # hash for input later $data{$cdir}{$file}{size} = $st_size; $data{$cdir}{$file}{ctime} = $st_ctime; $data{$cdir}{$file}{mtime} = $st_mtime; $data{$cdir}{$file}{atime} = $st_atime; } }
Then I just iterate over the hash to enter all the data into the database (no modification or calculations etc). Also, there are no errors or warnings during or after execution.

Here is what the data looks like in the db:

mysql> select file_id, file, size from files where size < 0; +---------+--------------+-------------+ | file_id | file | size | +---------+--------------+-------------+ | 635609 | fooo 3.mpg | -773913892 | | 635608 | baaar 1.mpg | -529282490 | | 635603 | foooo 3.mpg | -2035912248 | | 611851 | pagefile.sys | -2147483648 | +---------+--------------+-------------+ 4 rows in set (0.32 sec)
I have around 194 K files listed in the files table (from two different systems), and all of these files are from the same system.

Any suggestions on how I should solve this?


Replies are listed 'Best First'.
Re: File::stat's size method returns negative values
by vladb (Vicar) on May 06, 2003 at 21:19 UTC
    Appears to me like a case of some freakish number conversion issue. Could you provide us with files table definition? What type is the size field? What Perl code do you use to insert the values?

    I tend to think that there might be something happening between your code and mysql database that causes negative numbers. Could it be that the 'size' of the... ehem... size field isn't adequate and can't fit size values of some of the larger files?


    I'm not sure if you've stumbled over this newsgroup (comp.lang.perl.misc) post The author of the post refers to file sizes 'overflowing' the 32 bit number limit in Perl. From your post, this doesn't apear to be the deal, but I still found his workaround 'creative' ;).

    By the way, I didn't mention this initially, but did you also verify whether the stat() method returned a negative value on any of the files? If all come out to be positives, then for sure look at the mysql<->perl link.

    update 2: In reply to your update...

    Being somewhat at a loss for any other 'good' suggestions, the alternative I see is to check if the negative size value follows any particular pattern. For example, do negative sized files appear in the same directory? shared drive/resource? Are their attributes any similar?

    I'll test this on my win32 box shortly when I get home (I work on SOlaris at work ;)) and see if I could dig anything up.

    # Under Construction
      I thought about that as well, but like I said, the size field in the table contains numbers larger than the ones I displayed. To be sure though, I altered the column to use BIGINT and the problem still occured (after I re-ran my program).

      Here is the insertion code:

      sub writeDB { # stuff ... $sth2 = $dbh->prepare( " insert into files ( client_id, path_id, file, size, ctime, mtime, atime ) values ( ?, ?, ?, ?, ?, ?, ? ) "); foreach my $path (keys %data) { my $pathInsert = qq`insert into path ( client_id, Path ) values ( ?, ? )`; $sth3 = $dbh->prepare($pathInsert); $sth3->execute($client_id, $path); $sth3->finish(); my $path_id = $sth3->{'mysql_insertid'}; foreach my $file (keys %{$data{$path}}) { my $size = $data{$path}{$file}{size}; my $ctime = $data{$path}{$file}{ctime}; my $atime = $data{$path}{$file}{atime}; my $mtime = $data{$path}{$file}{mtime}; $sth2->execute( $client_id, $path_id, $file, $size, $ctime, $atime, $mtime ); } } $sth2->finish(); $dbh->disconnect(); }
      Update: I did check to see if any of the files were negative before the perl->mysql code and they were at the time stat was performed (stat($_)).

        not an answer, but a means of reconciling - could you also do a qx(stat $_) and see how those values compare to perls built-in stat (or File::Stat 's over-ride)?
        #!/usr/bin/perl -w use strict; my $file = $0; # how BIG am I ? my $pstat_size = (stat($file))[7]; (my $qstat_size) = qx(stat $file) =~ m/Size: (-?\d+)/; print "$pstat_size, $qstat_size\n"; # not that big

        output is 175, 175
        I ran this against some 2G+ oracle dbf's, no problem

        (code assumes *nix or MS system with a visible stat exe somewhere)

Re: File::stat's size method returns negative values
by BrowserUk (Pope) on May 07, 2003 at 03:34 UTC

    I couldnt reproduce the error but I don't have any file much over a couple of hundred megs.

    It might help you to track down whether the problem is located in the values being reported by the OS, the build of perl, or the File::Stat module if you tried querying the filesize of the affected files directly from the OS. You can use Win32::API to get at the GetFileSize OS API which may eliminate one part of the equation.

    #! perl -slw use strict; use Win32API::File 0.08 ':ALL'; use Win32::API; my $GetFileSize = Win32::API->new( 'Kernel32.dll', 'GetFileSize', 'NN' +, 'N' ) or die "Win32::API->new: $!, $^E"; open my $FH, '<', $ARGV[0] or die "Couldn't open $ARGV[0]: $!"; my $nativeFH = GetOsFHandle( $FH ) or die "GetOsFHandle: $^E"; my $OSSize = $GetFileSize->Call( $nativeFH, 0 ); die "GetFileSize error:$^E" if $OSSize == 0xFFFFFFFF; # See msdn docs. print "$ARGV[0]: $OSSize";

    The call as shown will only correctly report filesizes upto 4 Gb -2. To get at sizes larger there is some extra (twisted!) logic required using the second parm to the call. See the doc link above for details.

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
Re: File::stat's size method returns negative values
by PodMaster (Abbot) on May 07, 2003 at 08:56 UTC
    Which version of File::stat are you using?
    File::Stat is very little code, so I can't see how it'd be giving you negative values.
    Just use stat and see what happens.
    BTW, what does perl -V:uselargefiles report?

    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      I think you have found something here PodMaster. Here is the output of perl -V:userlargefiles:

      >perl -V:uselargefiles uselargefiles='undef';
      After a little more investigation it seems that any file over exactly 2gb (2,147,483,648 bytes) is giving the negative results. I have a file that is 2,004,792,320 bytes that is being reported properly. I guess I will have to figure out how to tell perl to use large files, then try again.

      Thanks for all the help.

      UPDATE: I am an idiot and here's why: 107391. I'll just go away now *hangs head in shame*.
      UPDATE 2: I did some searching and since I can't ./configure on a win32 system, I'm not sure how I can add 'uselargefiles'. Does anyone know for sure if this can't be done on a win32 system? Thanks.

Re: File::stat's size method returns negative values
by djw (Vicar) on May 07, 2003 at 21:46 UTC
    Ok, after a lot of searching/reading I finally sent an email to my local (winnipeg) perl mongers group and they mentioned that Perl 5.8 from ActiveState supports large files "out of the box". I was using 5.6.1 from AS previously so I installed the new version and the modules I needed, and VOILA! I can now properly stat files larger than 2GB in Perl.