|Problems? Is your data what you think it is?|
Finding first block of contiguous elements in an arrayby FamousLongAgo (Friar)
|on Dec 21, 2002 at 05:16 UTC||Need Help??|
FamousLongAgo has asked for the
wisdom of the Perl Monks concerning the following question:
Hello, fellow monks!
I have been writing a parser for some Protein Data Bank files, for a bioinformatics project. I have no problem extracting the sequences I need, but I am stumped by the titles. Here's the problem:
The files start out in this format:
The lines beginning with TITLE are the ones I'm interested in grabbing. There's a little caveat in that after the first line, the line number gets prepended to the title fragment. So in this example, the actual title is "Acutolysin A from snake venom of agkistrodon acutus at pH 7.5".
So far so dull. But later in the file, sometimes much later, there may be lines that also begin with TITLE. We want to ignore those.
Assuming the following constraints:
I know how to do this with regular expressions on a scalar, and how to do it in a very unelegant way by setting flags in a loop, but I suspect there is greater wisdom out there and can't wait to learn.
Special bonus to anyone who can tell me what an agkistrodon acutus is, and how deadly is its bite.