Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

print lines between a delimiter

by anasuya (Novice)
on May 21, 2012 at 16:20 UTC ( #971654=perlquestion: print w/ replies, xml ) Need Help??
anasuya has asked for the wisdom of the Perl Monks concerning the following question:

I have a file (seqid.txt) which looks like this.
>10GS:A >11BA:A >11BG:A >121P:A >12GS:A
Correspondingly, I have another file (sequences.txt) which looks like this:
>10GS:A PPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILR +HLGRTLGLYG KDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISF +ADYNLLDLLL IHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ >10GS:B PPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILR +HLGRTLGLYG KDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISF +ADYNLLDLLL IHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ >11BA:A KESAAAKFERQHMDSGNSPSSSSNYCNLMMCCRKMTQGKCKPVNTFVHESLADVKAVCSQKKVTCKNGQT +NCYQSKSTMR ITDCRETGSSKYPNCAYKTTQVEKHIIVACGGKPSVPVHFDASV >11BA:B KESAAAKFERQHMDSGNSPSSSSNYCNLMMCCRKMTQGKCKPVNTFVHESLADVKAVCSQKKVTCKNGQT +NCYQSKSTMR ITDCRETGSSKYPNCAYKTTQVEKHIIVACGGKPSVPVHFDASV >11BG:A KESAAAKFERQHMDSGNSPSSSSNYCNLMMCCRKMTQGKCKPVNTFVHESLADVKAVCSQKKVTCKNGQT +NCYQSKSTMR ITDCRETGSSKYPNCAYKTTQVEKHIIVACGGKPSVPVHFDASV >11BG:B KESAAAKFERQHMDSGNSPSSSSNYCNLMMCCRKMTQGKCKPVNTFVHESLADVKAVCSQKKVTCKNGQT +NCYQSKSTMR ITDCRETGSSKYPNCAYKTTQVEKHIIVACGGKPSVPVHFDASV >121P:A MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQ +YMRTGEGFLC VFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQ +GVEDAFYTLV REIRQH >12GS:A MPPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTIL +RHLGRTLGLY GKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQIS +FADYNLLDLL LIHEVLAPGCLDAFPLL
What I want is: to print out into another file only those lines which are present after each of the seqid's i.e. the file should look something like this:
>10GS:A PPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILR +HLGRTLGLYG KDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISF +ADYNLLDLLL IHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ >11BA:A KESAAAKFERQHMDSGNSPSSSSNYCNLMMCCRKMTQGKCKPVNTFVHESLADVKAVCSQKKVTCKNGQT +NCYQSKSTMR ITDCRETGSSKYPNCAYKTTQVEKHIIVACGGKPSVPVHFDASV >11BG:A KESAAAKFERQHMDSGNSPSSSSNYCNLMMCCRKMTQGKCKPVNTFVHESLADVKAVCSQKKVTCKNGQT +NCYQSKSTMR ITDCRETGSSKYPNCAYKTTQVEKHIIVACGGKPSVPVHFDASV >121P:A MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQ +YMRTGEGFLC VFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQ +GVEDAFYTLV REIRQH >12GS:A MPPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTIL +RHLGRTLGLY GKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQIS +FADYNLLDLL LIHEVLAPGCLDAFPLL
The logic is to match the seqid from the list provided and to keep on printing whatever occurs till the delimiter '>' is encountered. Please help.

Comment on print lines between a delimiter
Select or Download Code
Re: print lines between a delimiter
by davido (Archbishop) on May 21, 2012 at 17:07 UTC

    Let's call the first file your search keys, and the second file, the values mapped to those search keys. Let's also assume that the mapping file is larger and more expensive to work with than the simple search key file.

    First, open and read your search key file into a hash, stripping away the > and newline characters (You might just capture m/^>(\w+:\w)/, for example). Each search key becomes a hash key. Go ahead and close your search key file but keep that hash.

    Second, open the mapping file for input, and an output file. For your mapping file set the input record separator (search perlvar for $/ for an explanation) to >, so that you're only dealing with complete records, and have no need of worrying about newlines.

    Now iterate over each record in the mapping file. chomp (removing the trailing >). Discard records that are empty (this takes care of the first >, for example). Then match m/^(\w+:\w)\n/, and use exists to check whether that key exists in your hash of search keys.

    If you've got a match, print to your output file the current record prepended with a '>'.

    That's one way to do it. When you get stuck in the actual code let us know which part is presenting difficulty. Filling in the rest of the blanks shouldn't be too much different from the solutions you obtained in some of your previous questions.


    Dave

Re: print lines between a delimiter
by Anonymous Monk on May 22, 2012 at 03:23 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://971654]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-12-29 11:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (186 votes), past polls