http://www.perlmonks.org?node_id=1007563


in reply to Reading files n lines a time

Depending on the way your files-which-need-to-be-read-four-lines-at-a-time (henceforward, "fwn") are formatted/organized and the consistency thereof, you might consider setting your $/ record separator...

for ex, if data looks like:

name date bill name1 date1 bill1 ...

then setting local $/="\n\n" will tell Perl that you want to read a paragraph from the fwn, where paragraph is defined as something ending in two consecutive newlines. Better yet, the special case, $/="" defines para somewhat more broadly and may be suitable to deal with your data.

You'll find many examples here, if you merely SuperSearch for $/

Replies are listed 'Best First'.
Re^2: Reading files n lines a time
by naturalsciences (Beadle) on Dec 06, 2012 at 14:00 UTC
    Thanx. I already know about the record operator. My fwn-s unfortunately don't contain anything else as useful as newline to determine useful blocks. At least to my senses.
      Perhaps you can post a real (or baudlerized sample) snippet of your actual data. It's amazing what a bit of exposure to regular expressions can help one spot, and here, you'll have many such well-educated eyes looking for proxy-para-markers.

      I'm actually surprised -- no, very surprised -- that this request hasn't been posted higher in the thread.

        Right now it is simply a fasta file. Fasta files are for storing DNA sequence information and they are formatted as following.

        >nameofsequence\n

        ATCGTACGTTGCTE\n

        >anothername\n

        GTCTGT\n

        so that a line starting with > containing a sequence name is followed by a line containing sequences nucleotide information

        I am thinking of dredging them in 4 lines a time, because I have reasons to suspect that due to some certain previous operations there might be sequences directly following eachother with different names (on >sequencename\n line) but exactly the same sequence information (on following ATGCTGT\n line). Right now I'm looking to identify and remove such duplicates but I might make use of scripts dealing with many comparision extraction etc. of neighbouring sequences in my files. (Two neigbours means four lines)