Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Reading files n lines a time

by ww (Archbishop)
on Dec 06, 2012 at 13:28 UTC ( [id://1007563]=note: print w/replies, xml ) Need Help??


in reply to Reading files n lines a time

Depending on the way your files-which-need-to-be-read-four-lines-at-a-time (henceforward, "fwn") are formatted/organized and the consistency thereof, you might consider setting your $/ record separator...

for ex, if data looks like:

name date bill name1 date1 bill1 ...

then setting local $/="\n\n" will tell Perl that you want to read a paragraph from the fwn, where paragraph is defined as something ending in two consecutive newlines. Better yet, the special case, $/="" defines para somewhat more broadly and may be suitable to deal with your data.

You'll find many examples here, if you merely SuperSearch for $/

Replies are listed 'Best First'.
Re^2: Reading files n lines a time
by naturalsciences (Beadle) on Dec 06, 2012 at 14:00 UTC
    Thanx. I already know about the record operator. My fwn-s unfortunately don't contain anything else as useful as newline to determine useful blocks. At least to my senses.
      Perhaps you can post a real (or baudlerized sample) snippet of your actual data. It's amazing what a bit of exposure to regular expressions can help one spot, and here, you'll have many such well-educated eyes looking for proxy-para-markers.

      I'm actually surprised -- no, very surprised -- that this request hasn't been posted higher in the thread.

        Right now it is simply a fasta file. Fasta files are for storing DNA sequence information and they are formatted as following.

        >nameofsequence\n

        ATCGTACGTTGCTE\n

        >anothername\n

        GTCTGT\n

        so that a line starting with > containing a sequence name is followed by a line containing sequences nucleotide information

        I am thinking of dredging them in 4 lines a time, because I have reasons to suspect that due to some certain previous operations there might be sequences directly following eachother with different names (on >sequencename\n line) but exactly the same sequence information (on following ATGCTGT\n line). Right now I'm looking to identify and remove such duplicates but I might make use of scripts dealing with many comparision extraction etc. of neighbouring sequences in my files. (Two neigbours means four lines)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1007563]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-19 20:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found