Re: Reading files n lines a time

Depending on the way your files-which-need-to-be-read-four-lines-at-a-time (henceforward, "fwn") are formatted/organized and the consistency thereof, you might consider setting your $/ record separator...

for ex, if data looks like:

name
date
bill

name1
date1
bill1

...
[download]

then setting local $/="\n\n" will tell Perl that you want to read a paragraph from the fwn, where paragraph is defined as something ending in two consecutive newlines. Better yet, the special case, $/="" defines para somewhat more broadly and may be suitable to deal with your data.

You'll find many examples here, if you merely SuperSearch for $/

Comment on Re: Reading files n lines a time Select or Download Code

Replies are listed 'Best First'.
Re^2: Reading files n lines a time by naturalsciences (Beadle) on Dec 06, 2012 at 14:00 UTC
Thanx. I already know about the record operator. My fwn-s unfortunately don't contain anything else as useful as newline to determine useful blocks. At least to my senses.	[reply]
Re^3: Reading files n lines a time by ww (Archbishop) on Dec 06, 2012 at 20:27 UTC
Perhaps you can post a real (or baudlerized sample) snippet of your actual data. It's amazing what a bit of exposure to regular expressions can help one spot, and here, you'll have many such well-educated eyes looking for proxy-para-markers. I'm actually surprised -- no, very surprised -- that this request hasn't been posted higher in the thread.	[reply]
Re^4: Reading files n lines a time by naturalsciences (Beadle) on Dec 07, 2012 at 17:45 UTC
Right now it is simply a fasta file. Fasta files are for storing DNA sequence information and they are formatted as following. >nameofsequence\n ATCGTACGTTGCTE\n >anothername\n GTCTGT\n so that a line starting with > containing a sequence name is followed by a line containing sequences nucleotide information I am thinking of dredging them in 4 lines a time, because I have reasons to suspect that due to some certain previous operations there might be sequences directly following eachother with different names (on >sequencename\n line) but exactly the same sequence information (on following ATGCTGT\n line). Right now I'm looking to identify and remove such duplicates but I might make use of scripts dealing with many comparision extraction etc. of neighbouring sequences in my files. (Two neigbours means four lines)	[reply]
Re^5: Reading files n lines a time by ww (Archbishop) on Dec 07, 2012 at 19:22 UTC
Re^6: Reading files n lines a time by naturalsciences (Beadle) on Dec 07, 2012 at 19:51 UTC


laziness, impatience, and hubris
	PerlMonks