I want to help, but based on what you've posted, it's hard. First, you didn't post a complete XML file -- I have to wonder if there's anything else missing besides a bunch of close tags at the end of your sample data.
Then, your command line doesn't really give us enough info. What would be reasonable values for the "-n number_of_hits_to_keep" and "-b bit_score_cutoff" in order to get relevant results? Also, your command line uses a "-d" where I think the script is expecting a "-t".
And I'm sorry to seem picky, but you should be able to find an easy way to get your indention right (emacs? vi? some decent IDE or other programmer-savvy editor? perltidy?). Trust me, it really helps.
Anyway, I did manage to get the output you reported, but I'm a stranger to Bio::SearchIO, and I really can't tell what line or portion of the code actually has something to do with the "Hit_def" parameter in the xml file.
Just using a straightforward XPath extraction for "//Hit_def" on your (fixed) xml file does indeed return the full string you want - "43989.cce_0262 (Cyanothece ATCC 51142)".
In order to figure it out, I had to add this at the top:
use Data::Dumper 'Dumper';
Then step through it with the debugger until I got inside this block:
while (my $hit = $result->next_hit) {
Then my next debugger command was:
p Data::Dumper::Dumper($hit)
Looking through the resulting output, I found the missing string -- see if you can find it too... Once you do, you should be able to figure out how to print it to your output file as desired:
$VAR1 = bless( {
'_hsps' => [
{
'-query_start' => 253,
'-algorithm' => 'BLASTX',
'-gaps' => '0',
'-hit_seq' => 'ITGAVCLMDYLEKVLEKLRELAQ
+KLIETLLGPQ',
'-hit_length' => '65',
'-query_length' => '508',
'-query_desc' => 'HKUN3Y301D9XQX',
'-query_frame' => -1,
'-rank' => 1,
'-hit_desc' => '43989.cce_0262 (Cyanot
+hece ATCC 51142)',
'-query_end' => 155,
'-hit_name' => 'gnl|BL_ORD_ID|1515029'
+,
'-identical' => '17',
'-query_name' => 'Query_1',
'-evalue' => '0.00664016',
'-score' => '92',
'-conserved' => '27',
'-hit_frame' => 0,
'-hsp_length' => '33',
'-query_seq' => 'LRGAICSMEHIEEALGKLKDW
+ARKLIELLLGPR',
'-hit_start' => '12',
'-homology_seq' => '+ GA+C M+++E+ L KL
+++ A+KLIE LLGP+',
'-hit_end' => '44',
'-bits' => '40.0466'
}
],
'_iterator' => 0,
'_description' => '(Cyanothece ATCC 51142)',
'_significance' => '0.00664016',
'_query_length' => '508',
'_accession' => '1515029',
'_length' => '65',
'_psiblast_iteration' => '1',
'_name' => '43989.cce_0262',
'_rank' => 1,
'_algorithm' => 'BLASTX',
'_root_verbose' => 0,
'_hashes' => {
'0' => 1
},
'_hsp_factory' => bless( {
'interface' => 'Bio::Searc
+h::HSP::HSPI',
'type' => 'Bio::Search::HS
+P::GenericHSP',
'_loaded_types' => {
'Bio:
+:Search::HSP::GenericHSP' => 1
},
'_root_verbose' => 0
}, 'Bio::Factory::ObjectFact
+ory' )
}, 'Bio::Search::Hit::GenericHit' );
|