Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

bullet proof SLURP file

by leszekdubiel (Sexton)
on Jan 09, 2018 at 19:01 UTC ( #1207003=perlquestion: print w/replies, xml ) Need Help??
leszekdubiel has asked for the wisdom of the Perl Monks concerning the following question:

It's easy to read whole file to a variable -- open it, check if errors on opening, undef $/, read <>.
#!/usr/bin/perl -CSDA use utf8; use strict; use warnings; use Carp; my $whole_file = do { open my $f, "<", "mydata.txt" or croak "can't open mydata.txt: + $!"; local $/; <$f>; }; print "data from file is: $whole_file\n";
But it checks errors on opening only. What happens if error is during file reading? How to check if all reading operation suceeded? We have to be careful on reading and also on closing file. What is the best way to read file and fail on any erorrs? Would you do it like this:
#!/usr/bin/perl -CSDA use utf8; use strict; use warnings; use Carp; my $whole_file = do { open my $f, "<", "mydata.txt" or croak "can't open mydata.txt: + $!"; local $/; my $x = <$f>; $! and croak "can't read mydata.txt: $!"; close $f or croak "can't close mydata.txt: $!"; $x; }; print "data from file is: $whole_file\n";

Replies are listed 'Best First'.
Re: bullet proof SLURP file
by haukex (Abbot) on Jan 09, 2018 at 19:47 UTC
    my $x = <$f>; $! and croak "can't read mydata.txt: $!";

    This is incorrect, since Perl does not guarantee that $! is cleared on a successful read. You'll have to check for errors by checking the return value of readline:

    open my $fh, '<', 'mydata.txt' or die $!; defined( my $data = do { local $/; <$fh> } ) or die $!; close $fh or die $!;

    But may I ask why you need this level of error checking? Are you reading from some kind of network resource for example? Because if it's just a regular file from a local harddisk, failures on e.g. close are pretty rare.

    If you really do need this level of security, then one possible solution is using read and making sure the number of bytes read matches the file size - however, the following makes sense only if this is a plain ASCII or binary file, the file is not too huge, and note that the following has a theoretical race condition if someone else happens to be writing to the file at the same time as you're reading from it:

    open my $fh, '<:raw', 'mydata.txt' or die $!; my $size = -s $fh; read($fh, my $data, $size) == $size or die $!; close $fh or die $!;
      This is a program that computes data for manufacturing in company. I have to be sure that when data read from local file is broken in the middle of the process of reading, then the whole program fails. I will do that with "define" as you showed. If I read from a pipe (pipe open), then it fails on close if pipe is broken -- this is why I check exit status of close.
        If I read from a pipe (pipe open), then it fails on close if pipe is broken -- this is why I check exit status of close.

        Yes, checking the return value of close is very important on piped opens - as per its documentation:

        close $fh or die $! ? "Error closing pipe: $!" : "Piped open exit status: $?";
        when data read from local file is broken in the middle of the process of reading

        It depends a bit on what "broken" means - if you are always reading from a pipe, then yes, you should be able to detect this condition. However, if you're reading from a regular file, then I don't think the reading program will be able to detect a failure in the writer, if that is what you mean. If the file format allows for any kind of sanity checks (like headers with record lengths or other well-formedness checks), or even checksums, then I would use those.

        This is a program that computes data for manufacturing in company. I have to be sure that when data read from local file is broken in the middle of the process of reading, then the whole program fails.
        I have worked with many HD's that have either hardware errors or file system errors (usually both types of errors occur at the same time). Your original Perl program will abend with a fatal error if the complete HD file cannot be read. It is possible to open a file which cannot be successfully read to the EOF.

        close() does a lot for a "writer" -> flushes unwritten cached stuff to the HD. Not so much is done for a reader. A close() on a read handle does not modify the actual file that is being read. Again, that is not true on a write handle.

        Your original program will fail if the file is corrupted. Let's say that happens, then what is your "Plan"? A redundant disk system is probably what is needed.

Re: bullet proof SLURP file
by AnomalousMonk (Chancellor) on Jan 09, 2018 at 20:50 UTC
    What is [a] way to read file and fail on any erorrs?

    If you don't need fancy error messages, how about autodie? It's lexical, so it can be used in the narrowest possible scope, globally, etc. E.g. (untested):

    my $filename = '...'; ... my $whole_nine_yards = do { use autodie; open my $fh, '<', $filename; local $/; <$fh>; };


    Give a man a fish:  <%-{-{-{-<

Re: bullet proof SLURP file
by davido (Archbishop) on Jan 10, 2018 at 07:13 UTC

    Do you have control of the process that writes to the file? If so, assure that it implements file locking, and cause it to also create a checksum that is saved in another file alongside the primary file. After writing, verify the checksum matches what got written.

    Then the reader can obtain a lock, read the checksum, read the target file, compare, and derive a strong assurance that the entire file was read.

    Otherwise, check file size before the read and after the read. Check the mtime before the read and after. If it changed, that's bad. Use the bytes pragma and get the length of the scalar holding the input. Compare that length to the file size detected before the read. And check for eof after the read.


    Dave

      Perhaps with something like this?:

      #!/usr/bin/env perl use strict; use warnings; use Path::Tiny; use feature qw(say); use open ':encoding(UTF-8)'; my $file = q(data.txt); path($file)->spew_utf8(q(Lorem ipsum kizuaheli)); my $digest = qq($file\.digest); path($digest)->spew_utf8( path($file)->digest ); # path ($digest)->append_utf8("nose"); if ( path($file)->digest ne path($digest)->slurp_utf8 ) { say q(Something went wrong!); } else { my $fh = path($file)->filehandle( { locked => 1 }, "<" ); say <$fh>; } __END__

      Not really tested. Just an idea.

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: bullet proof SLURP file
by karlgoethebier (Monsignor) on Jan 09, 2018 at 21:14 UTC

    Shouldn't something like this catch them all?

    #!/usr/bin/env perl use strict; use warnings; use Path::Tiny; use Try::Tiny; use autodie qw(:all); use feature qw(say); my $file = shift; try { say path($file)->slurp_utf8; } catch { print $_; }; __END__

    I'm not sure. And even worse: I have no idea for the moment/in a hurry how to test it.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Shouldn't something like this catch them all?

      Doesn't look like it, sorry :-( A look at the Path::Tiny source shows it seems to do no error checking on the reads at all.

        Too bad ;-[. And i need a while to digest the code you provided.

        Thanks and best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1207003]
Front-paged by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2018-02-24 10:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When it is dark outside I am happiest to see ...














    Results (310 votes). Check out past polls.

    Notices?