Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re^7: multiple XML fields in one line

by poj (Monsignor)
on Aug 10, 2014 at 06:41 UTC ( #1096887=note: print w/replies, xml ) Need Help??

in reply to Re^6: multiple XML fields in one line
in thread multiple XML fields in one line

Look in the XMl file for instances where you have multiple <Hit_hsps> tags within a <Hit> or multiple <Hsp> tags within a <Hit_hsps>.

This test data replicates your error

<?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://ww"> <BlastOutput> <BlastOutput_iterations> <Iteration> <Iteration_hits> <Hit> <Hit_num>1</Hit_num> <Hit_def>Uncultured Sulfuricurvum sp. RIFRC-1, complete genome</Hit_ +def> <Hit_hsps> <Hsp> <Hsp_identity>16</Hsp_identity> </Hsp> </Hit_hsps> </Hit> <Hit> <Hit_num>2</Hit_num> <Hit_def>Neosartorya fischeri NRRL 181 conserved hypothetical protei +n (NFIA_106270) partial mRNA</Hit_def> <Hit_hsps> <Hsp> <Hsp_identity>16</Hsp_identity> </Hsp> </Hit_hsps> <Hit_hsps> <Hsp> <Hsp_identity>16a</Hsp_identity> </Hsp> <Hsp> <Hsp_identity>16b</Hsp_identity> </Hsp> </Hit_hsps> </Hit> </Iteration_hits> </Iteration> </BlastOutput_iterations> </BlastOutput>
Update : Try this
#!perl use strict; use warnings; use XML::Simple; use Data::Dump 'pp'; # force these tags into array my $blast = XMLin('BLAST2.XML',ForceArray=>['Hit','Hit_hsps','Hsp']); my $hits = $blast->{BlastOutput_iterations}->{Iteration}->{Iteration_h +its}->{Hit}; my $ret; foreach my $hit (@{$hits}) { my @Hsp_identity; for my $Hit_hsps ( @{$hit->{Hit_hsps}} ){ for ( @{$Hit_hsps->{'Hsp'}} ) { push @Hsp_identity, $_->{'Hsp_identity'} } }; push @{$ret},join '|', $hit->{Hit_def}, $hit->{Hit_num}, @Hsp_identity; }; pp $ret;

Replies are listed 'Best First'.
Re^8: multiple XML fields in one line
by smice (Initiate) on Aug 11, 2014 at 08:54 UTC

    Sorry I couldn't access the net yesterday.

    Aaand... YES! That works. Fanfare and fireworks! :)

    So as you also found out, the problem was caused by some hits that have multiple <Hsp>-s within one <Hit>. Funny how it was right in front of my eyes yet I couldn't realize it for the first time. Actually the first two hits I sent as a a sample file were just incidental exceptions.

    I've edited the original script to include your code, and now it processes every file without any problem.

    Excellent work, thank you very much for doing my job and saving me a lot of headache! It was also nice to learn about Perl. Thanks a lot again!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1096887]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2018-06-18 06:25 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (108 votes). Check out past polls.