Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^7: multiple XML fields in one line

by poj (Abbot)
on Aug 10, 2014 at 06:41 UTC ( [id://1096887]=note: print w/replies, xml ) Need Help??


in reply to Re^6: multiple XML fields in one line
in thread multiple XML fields in one line

Look in the XMl file for instances where you have multiple <Hit_hsps> tags within a <Hit> or multiple <Hsp> tags within a <Hit_hsps>.

This test data replicates your error

<?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://ww +w.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd"> <BlastOutput> <BlastOutput_iterations> <Iteration> <Iteration_hits> <Hit> <Hit_num>1</Hit_num> <Hit_def>Uncultured Sulfuricurvum sp. RIFRC-1, complete genome</Hit_ +def> <Hit_hsps> <Hsp> <Hsp_identity>16</Hsp_identity> </Hsp> </Hit_hsps> </Hit> <Hit> <Hit_num>2</Hit_num> <Hit_def>Neosartorya fischeri NRRL 181 conserved hypothetical protei +n (NFIA_106270) partial mRNA</Hit_def> <Hit_hsps> <Hsp> <Hsp_identity>16</Hsp_identity> </Hsp> </Hit_hsps> <Hit_hsps> <Hsp> <Hsp_identity>16a</Hsp_identity> </Hsp> <Hsp> <Hsp_identity>16b</Hsp_identity> </Hsp> </Hit_hsps> </Hit> </Iteration_hits> </Iteration> </BlastOutput_iterations> </BlastOutput>
Update : Try this
#!perl use strict; use warnings; use XML::Simple; use Data::Dump 'pp'; # force these tags into array my $blast = XMLin('BLAST2.XML',ForceArray=>['Hit','Hit_hsps','Hsp']); my $hits = $blast->{BlastOutput_iterations}->{Iteration}->{Iteration_h +its}->{Hit}; my $ret; foreach my $hit (@{$hits}) { my @Hsp_identity; for my $Hit_hsps ( @{$hit->{Hit_hsps}} ){ for ( @{$Hit_hsps->{'Hsp'}} ) { push @Hsp_identity, $_->{'Hsp_identity'} } }; push @{$ret},join '|', $hit->{Hit_def}, $hit->{Hit_num}, @Hsp_identity; }; pp $ret;
poj

Replies are listed 'Best First'.
Re^8: multiple XML fields in one line
by smice (Initiate) on Aug 11, 2014 at 08:54 UTC

    Sorry I couldn't access the net yesterday.

    Aaand... YES! That works. Fanfare and fireworks! :)

    So as you also found out, the problem was caused by some hits that have multiple <Hsp>-s within one <Hit>. Funny how it was right in front of my eyes yet I couldn't realize it for the first time. Actually the first two hits I sent as a a sample file were just incidental exceptions.

    I've edited the original script to include your code, and now it processes every file without any problem.

    Excellent work, thank you very much for doing my job and saving me a lot of headache! It was also nice to learn about Perl. Thanks a lot again!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1096887]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-19 22:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found