Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re: Reading files n lines a time

by ww (Archbishop)
on Dec 06, 2012 at 13:28 UTC ( #1007563=note: print w/replies, xml ) Need Help??

in reply to Reading files n lines a time

Depending on the way your files-which-need-to-be-read-four-lines-at-a-time (henceforward, "fwn") are formatted/organized and the consistency thereof, you might consider setting your $/ record separator...

for ex, if data looks like:

name date bill name1 date1 bill1 ...

then setting local $/="\n\n" will tell Perl that you want to read a paragraph from the fwn, where paragraph is defined as something ending in two consecutive newlines. Better yet, the special case, $/="" defines para somewhat more broadly and may be suitable to deal with your data.

You'll find many examples here, if you merely SuperSearch for $/

Replies are listed 'Best First'.
Re^2: Reading files n lines a time
by naturalsciences (Beadle) on Dec 06, 2012 at 14:00 UTC
    Thanx. I already know about the record operator. My fwn-s unfortunately don't contain anything else as useful as newline to determine useful blocks. At least to my senses.
      Perhaps you can post a real (or baudlerized sample) snippet of your actual data. It's amazing what a bit of exposure to regular expressions can help one spot, and here, you'll have many such well-educated eyes looking for proxy-para-markers.

      I'm actually surprised -- no, very surprised -- that this request hasn't been posted higher in the thread.

        Right now it is simply a fasta file. Fasta files are for storing DNA sequence information and they are formatted as following.





        so that a line starting with > containing a sequence name is followed by a line containing sequences nucleotide information

        I am thinking of dredging them in 4 lines a time, because I have reasons to suspect that due to some certain previous operations there might be sequences directly following eachother with different names (on >sequencename\n line) but exactly the same sequence information (on following ATGCTGT\n line). Right now I'm looking to identify and remove such duplicates but I might make use of scripts dealing with many comparision extraction etc. of neighbouring sequences in my files. (Two neigbours means four lines)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1007563]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2018-03-22 08:32 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (273 votes). Check out past polls.