Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Help needed in reading a very large file line by line

by Anonymous Monk
on Feb 28, 2012 at 10:32 UTC ( [id://956620]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need help in reading a very large file Line by Line.

I checked out Re: help reading from large file needed , but that doesn't seem to be working.

Is there any other way to open a large file and read it line by line?

  • Comment on Help needed in reading a very large file line by line

Replies are listed 'Best First'.
Re: Help needed in reading a very large file line by line
by tobyink (Canon) on Feb 28, 2012 at 10:50 UTC

    The example you linked to deals with files with fixed-length records - i.e. generally binary files. As you're talking about "line by line", I assume you're talking about a text-based file - e.g. plain text, HTML, CSV, etc.

    # Set the character which will be used to indicate the end of a line. # This defaults to the system's end of line character, but it doesn't # hurt to set it explicitly, just in case some other part of your code # has altered it from the default. local $/ = "\n"; # Open the file for read access: open my $filehandle, '<', 'myfile.txt'; my $line_number = 0; # Loop through each line: while (defined($line = <$filehandle>)) { # The text of the line, including the linebreak # is now in the variable $line. # Keep track of line numbers $line_number++; # Strip the linebreak character at the end. chomp $line; # Do something with the line. do_something($line); # Perhaps bail out of the loop if ($line =~ m/^ERROR/) { warn "Error on line $line_number - skipping rest of file"; last; } }

    But we can make the above more concise, because Perl usefully defines a variable called $_ which is used as a default variable in many cases; and a variable called $. which keeps track of the current line number.

    # Set the character which will be used to indicate the end of a line. local $/ = "\n"; # Open the file for read access: open my $filehandle, '<', 'myfile.txt'; # Loop through each line: while (<$filehandle>) { # The text of the line, including the linebreak # is now in the variable $_. # Strip the linebreak character at the end. chomp; # Do something with the line. do_something($_); # Perhaps bail out of the loop if (m/^ERROR/) { warn "Error on line $. - skipping rest of file"; last; } }
      The example you linked to deals with files with fixed-length records...

      NB: Re: help reading from large file needed begins by briefly alluding to processing files with fixed-length records, but then continues with a detailed discussion, with example, of indexing a variable-length record file for rapid random access.

      while(<FILEHANDLE>) is giving an out of memory error.

        Just post your script (between <c> </c> tags) and we can tell you what is wrong.

        Chances are that either $/ is set to something silly instead of "\n", or your file has no line break characters in it (or at least, very long lines).

        Hi, I see this is an old thread, but still, I would like to share that I have expereciend something similar. I wanted to do a very simple search and replace on a huge ASCII file (around 4GB) using the magic filehandle <>. The thing is that I cannot use seek or whichever method that requires fixed length of records. Also my $/ is set to "\n" and I know that the lines are not incredibly long. Any ideas?

        Here is a piece of code:

        my $fh = new FileHandle; @ARGV = ($file); open $fh, ">test.txt"; while ($line = <>) { $line =~ s/$search/$replace/g; print $fh $line; }
Re: Help needed in reading a very large file line by line
by choroba (Cardinal) on Feb 28, 2012 at 10:41 UTC
    To read a file line by line, just use
    while (<>) { # process the line contained in $_ }
    What do you mean by "does not seem to be working"?
      Opening the big file with the open statement and then using while(<FILEHANDLE>) gives an Out of Memory! error.

        So the file doesn't contain lines?

        It is giving "Out of memory" because in your environment (I suppose it is a UNIX based one) your settings for "ulimit -a" at "data(kbytes)" is less than the file's size; Try modifying the data parameter with a value larger than the file you are processing. If you can't do that, use Tie::File. Slower but not a memory hog user.
Re: Help needed in reading a very large file line by line
by trizen (Hermit) on Feb 28, 2012 at 15:06 UTC
    For very long lines, you can try something like this:
    open my $fh, '<', $filename; my $line = ''; my $track = 0; my $max_line_length = 1024; # or whatever while (defined(my $char = getc $fh)) { $line .= $char; if (++$track == $max_line_length or $char eq "\n") { print $line; $line = ''; $track = 0; } } close $fh;
Re: Help needed in reading a very large file line by line
by CountZero (Bishop) on Feb 28, 2012 at 17:26 UTC
    How is a "line" defined in this file? It seems the definition of "line" in your file is different from what Perl expects a line to be.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Help needed in reading a very large file line by line
by Anonymous Monk on Feb 28, 2012 at 10:37 UTC
    Also can somebody please explain the code in Re: help reading from large file needed in a more readable program. I am new to Perl, that looked to have a lot of deep Perl in it. But I couldn't understand any of it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://956620]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-20 04:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found