Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Reading files n lines a time

by ww (Bishop)
on Dec 06, 2012 at 13:28 UTC ( #1007563=note: print w/ replies, xml ) Need Help??


in reply to Reading files n lines a time

Depending on the way your files-which-need-to-be-read-four-lines-at-a-time (henceforward, "fwn") are formatted/organized and the consistency thereof, you might consider setting your $/ record separator...

for ex, if data looks like:

name date bill name1 date1 bill1 ...

then setting local $/="\n\n" will tell Perl that you want to read a paragraph from the fwn, where paragraph is defined as something ending in two consecutive newlines. Better yet, the special case, $/="" defines para somewhat more broadly and may be suitable to deal with your data.

You'll find many examples here, if you merely SuperSearch for $/


Comment on Re: Reading files n lines a time
Select or Download Code
Re^2: Reading files n lines a time
by naturalsciences (Beadle) on Dec 06, 2012 at 14:00 UTC
    Thanx. I already know about the record operator. My fwn-s unfortunately don't contain anything else as useful as newline to determine useful blocks. At least to my senses.
      Perhaps you can post a real (or baudlerized sample) snippet of your actual data. It's amazing what a bit of exposure to regular expressions can help one spot, and here, you'll have many such well-educated eyes looking for proxy-para-markers.

      I'm actually surprised -- no, very surprised -- that this request hasn't been posted higher in the thread.

        Right now it is simply a fasta file. Fasta files are for storing DNA sequence information and they are formatted as following.

        >nameofsequence\n

        ATCGTACGTTGCTE\n

        >anothername\n

        GTCTGT\n

        so that a line starting with > containing a sequence name is followed by a line containing sequences nucleotide information

        I am thinking of dredging them in 4 lines a time, because I have reasons to suspect that due to some certain previous operations there might be sequences directly following eachother with different names (on >sequencename\n line) but exactly the same sequence information (on following ATGCTGT\n line). Right now I'm looking to identify and remove such duplicates but I might make use of scripts dealing with many comparision extraction etc. of neighbouring sequences in my files. (Two neigbours means four lines)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1007563]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2015-07-05 12:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (66 votes), past polls