Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: More efficient?

by jkahn (Friar)
on Apr 30, 2003 at 20:19 UTC ( #254452=note: print w/ replies, xml ) Need Help??


in reply to Improve code to parse genetic record

Here's my take on it. I'm not sure why the keys within $c at the top of the routine are being used the way they are, so I guessed that you really only want $c to be a hash hashref of $pkey values.

I separated out the record processing into a separate routine for clarity. The way I would handle this is to keep a short buffer (here it's @cognate_rows) of the strings you expect to pair up.

By the way, it's probably wise for you to investigate bioperl -- I bet this is a standard format.

sub _parse_paired { my $this = shift; my $pkey = 1; # don't know why these keys are here... my $c = { comments => '', left_instance => '', right_instance => '', match => '' }; ### build up each record and place in the collection ### $INPUT_RECORD_SEPARATOR = "\n\n\n"; while (my $record = $this->{handle}->getline()) { my $rec_href = _build_record($record, $pkey); $c->{pkey} = $rec_href; ++$pkey; } return $c; } #here's the routine I factored out: sub _build_record { my ($record, $key) = (@_); my %data = (); # keys will be left_sequence and right_sequence my @rows = split /\n/, $record; my (@cognate_rows); my $curr_cognate_matches = 1; while (@rows) { local $_ = shift @rows; chomp; if (/^\s*$/) { next; #skip blanks } if (/^\s+\d+/) { # you may have to adjust how _load_stats # works, or pass in $c to this routine. _load_stats($key, $_, \%data); } elsif (/^Sbjct/) { push @cognate_rows, $_; if (@cognate_rows == 2) { # we've found two rows that we expect to go together here. # if it matters, we know whether $curr_cognate_matches when we # reach this point my ($l, $r); (undef, $l, undef) = split /\s+/, $cognate_rows[0]; (undef, $r, undef) = split /\s+/, $cognate_rows[1]; $data{left_sequence} .= $l; $data{right_sequence} .= $r; # reset match to true $curr_cognate_matches = 1; # dump the buffer @cognate_rows = (); } } elsif (/!/) { # we know the current @cognate_rows *don't* match $curr_cognate_matches = 0; next; #discard this line } } #end while rows return \%data; }

Code is completely untested. It compiles, under strict, provided you're using English. That's as far as I've gone to check this.


Comment on Re: More efficient?
Select or Download Code
Re: Re: More efficient?
by Anonymous Monk on Apr 30, 2003 at 22:58 UTC
    "By the way, it's probably wise for you to investigate bioperl -- I bet this is a standard format."

    It isn't. In fact when I asked the bioperl list if a parser exists for this tool, and if they would be interested in one, I got no answer.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://254452]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (13)
As of 2015-07-03 09:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (51 votes), past polls