Hello, fellow monks!
I have been writing a parser for some Protein Data Bank files, for a bioinformatics project. I have no problem extracting the sequences I need, but I am stumped by the titles. Here's the problem:
The files start out in this format:
HEADER METAL BINDING PROTEIN 31-AUG-98 1BSW
TITLE ACUTOLYSIN A FROM SNAKE VENOM OF AGKISTRODON ACUTUS AT PH
TITLE 2 7.5
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: ACUTOLYSIN A;
The lines beginning with TITLE
are the ones I'm interested in grabbing. There's a little caveat in that after the first line, the line number gets prepended to the title fragment. So in this example, the actual title is "Acutolysin A from snake venom of agkistrodon acutus at pH 7.5".
So far so dull. But later in the file, sometimes much later, there may be lines that also begin with TITLE
. We want to ignore those.
Assuming the following constraints:
- We treat the file as an array ( no slurping into a scalar )
- There is no way to distinguish the later TITLE elements by pattern matching.
Can anyone think of an elegant way to grab the first block of 1+ contiguous TITLE
lines, and stop?
I know how to do this with regular expressions on a scalar, and how to do it in a very unelegant way by setting flags in a loop, but I suspect there is greater wisdom out there and can't wait to learn.
Special bonus to anyone who can tell me what an agkistrodon acutus
is, and how deadly is its bite.
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.