Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Jumping out of a partially read file

by licking9Volts (Pilgrim)
on May 09, 2002 at 19:34 UTC ( [id://165475]=perlquestion: print w/replies, xml ) Need Help??

licking9Volts has asked for the wisdom of the Perl Monks concerning the following question:

I've got over a thousand files in a directory. At the top of each file is a multi-line header. The number of lines in each header varies, but there is a marker signifying the end of the section I need. Below the header is thousands of lines of data that I don't need to look at. My question is this: Is there a way to jump out of an open file at a specified point, and then go on to the next file? Reading through the entire file when I only need the top few lines isn't very efficient when many of these files are around 40mb. Below is a sample snippet of what I'm using to open the files. Thanks in advance.
while ($file = <*.las>) { open(FILE, "$file") || die "Couldn't open $file for reading.\n"; while (<FILE>) { $foo = $bar; } close(FILE); }

Replies are listed 'Best First'.
Re: Jumping out of a partially read file
by perlplexer (Hermit) on May 09, 2002 at 20:00 UTC
    You are not forced to read the whole file. You can read a few bytes and close it; there is nothing wrong with that.
    In your case, since the information that you're looking for is always at the beginning of each file, and you have a way of identifying when each header ends you can do the following.
    local $/ = "HEADER END\n"; # set input record separator while ($file = <*.las>) { open(FILE, $file) || die "Couldn't open $file : $!\n"; my $header = <FILE>; close(FILE); # process $header here }
    One thing I should mention here is this. Setting $/ simplifies things greately. But! If you're processing large files (you mentioned something about them being 40MB) then you better be sure they all contain those headers that you're talking about. If not, the whole file will be slurped into memory.

    If you want to avoid that, you can do something along these lines
    my $maxHeader = 50; OUTER: while ($file = <*.las>) { open FILE, $file or die "Couldn't open $file : $!\n"; my $header = ''; while (<FILE>){ $header .= $_; last if $_ eq "HEADER END\n"; if ($. > $maxHeader){ print "Invalid file format : $file\n"; close FILE; next OUTER; } } close FILE; # process $header here }
    Hope this helps. --perlplexer
Re: Jumping out of a partially read file
by thelenm (Vicar) on May 09, 2002 at 20:05 UTC
    Yep, what BUU said. Even though you're jumping out early, be sure and close the filehandle before moving on.

    Since you say that there is a special marker signifying the end of the header section, you may also be able to save yourself some time and trouble by setting $/ to that marker. Then you won't even need to loop. To read the header section and nothing else, do this:

    local $/ = '<END_OF_HEADERS>'; # change this to what the marker really + is open FILE, $file or die "Couldn't open '$file': $!\n"; my $header = <FILE>; close FILE;
    Now your header section is contained in $header. Hope this helps!
Re: Jumping out of a partially read file
by BUU (Prior) on May 09, 2002 at 19:43 UTC
    inside the while (<FILE>){} loop, you would use the normal loop controlling events, namely next and last. In this case you probably want last, you would just use last if $_ = marker or whatever.
      BUU you nailed it! Thanks also to particle for the tip on warning instead of dying on the file open. All the other suggestions were great too! Oh, thelenm, when it jumps out, it jumps to a close(FILE), but thanks for confirming it before I had to ask about it =]. Thanks again everyone for the very helpful responses!

Re: Jumping out of a partially read file
by particle (Vicar) on May 09, 2002 at 20:06 UTC
    while the use of $/ will get just the header, it's stuck in a scalar. this lets you do line-by-line processing inside the header. also, it uses labels and a precompiled regex, which i think is pretty nifty.

    oh, and do you really want to die if you can't process a single file? i changed this to warn on open troubles, and die on close troubles. your mileage may vary.

    #!/usr/bin/perl -w use strict; $|++; my $marker = qr/^__END_HEAD__$/; FILE: while( my $file = <*.las> ) { open(FH, $file) or warn "Warning: can't open $file, skipping...\n" + and next FILE; LINE: while( chomp(<FH>) ) { last LINE if /$marker/; # ... } close(FH) or die "ERROR: can't close $file, exiting...\n"; }

    ~Particle *accelerates*

Re: Jumping out of a partially read file
by thunders (Priest) on May 09, 2002 at 20:25 UTC
    you can pretty much treat a file handle as a list of lines. Any old loop structure will do. I'm partial to foreach. The following code prints the first six lines of a file.
    open(FILE,'<some_file.txt'); for((<FILE>)[0..5]){ print; }
    Now lets say your special delimiter is a weird tie fighter thing like this :o:
    We can slurp every line up to and including that like so.
    open(FILE,'<some_file.txt') or die("cant open file: $!"); my $data; for(<FILE>){ if($_ !~ /:o:/){ $data .= $_; }else{ $data .= $_; last; } } close FILE;
    Of course we should also check the first however many lines for the special character first to avoid slurping up the entire file if we dont find that character.
Re: Jumping out of a partially read file
by mephit (Scribe) on May 09, 2002 at 21:46 UTC
    while (<FILE>) { last unless /^/ .. /^MARKER/; $foo = $bar; }
    Or whatever pattern you use to signify the end of the header. HTH.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://165475]
Approved by derby
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-16 04:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found