Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: multiple XML fields in one line

by poj (Priest)
on Aug 08, 2014 at 17:02 UTC ( #1096784=note: print w/ replies, xml ) Need Help??


in reply to Re^2: multiple XML fields in one line
in thread multiple XML fields in one line

Can you provide a small sample of the XML file that fails.
poj


Comment on Re^3: multiple XML fields in one line
Re^4: multiple XML fields in one line
by smice (Initiate) on Aug 08, 2014 at 21:22 UTC

    Yep, here is the first part, after the <Iteration_hits> there are lots of <Hit>-s, all with exactly the same structure, so I kept only the first two. The point is to search in the <Hit_def> of these <Hit>-s, but in the output we'd like to see the <Hit_num> and <Hsp_identity> attributes of the matching <Hit>-s too.

      This test program works against your sample data, try running it against the complete file

      #!perl use strict; use warnings; use XML::Simple; use Data::Dump 'pp'; my $blast = XMLin('BLAST1.XML'); my $hits = $blast->{BlastOutput_iterations}->{Iteration}->{Iteration_h +its}->{Hit}; my $ret; #push @ret, $_->{Hit_def} foreach (@{$hits}); foreach (@{$hits}) { push @{$ret},join '|', $_->{Hit_def}, $_->{Hit_num}, $_->{Hit_hsps}->{Hsp}->{Hsp_identity}; } pp $ret;
      poj

        Ah, it's killing me. I tried your test program with my original XML file. Same error as before: 'Not a HASH reference at line 12' (which is: push @{$ret},join '|',).

        I tried it however with the partial file that I sent you. Wow! It works perfectly! I ran again the original program with your modification on the partial XML file. Again, it works perfectly, I get exactly the results I hoped for.

        So is it related to the input file? Maybe my XML file is somehow messed up. So for testing I generated a few more XML files with the appropriate software, but all of them caused this 'Not a HASH reference' error. I compared the complete XML files with the partial XML I sent you, went over and over them like a thousand times, but I couldn't find any difference, except for the number of 'Hit'-s of course, and consequently, the size. Oh, there was one other thing: In the complete XMLs the lines ended with a single newline character (\n), but in the partial XML the EOL was a carriage return and a newline (\r\n). So I replaced all the \n with \r\n, but I still got the error, so the EOL seems to be irrelevant. And with the partial XML the program still worked correctly even if I replaced every \r\n with \n.

        I also tried to shamelessly hack into your code with my limited Perl knowledge, trying different ways to reference, but it only got worse (as had been expected :))

        So all in all, I am totally clueless. I don't get why it should be a HASH reference in the first place; @{$ret} is an array, right? Not a hash. Then I don't get how the input file influences the reference. Especially that in line 12 there is nothing related to the input file, it only says that we will push values into the end of the empty @{$ret} array (and join some of them). And finally I don't get what is the key difference between the 'good' and 'bad' XML files. Why only the partial file is working? If the program runs properly for 2 hits, why it doesn't for 99 hits?

        Mysterious. So much for today, tomorrow I will start removing the hits from a complete XML file one by one, to see if there is a size limit somewhere, or if it has any effect at all...

        Thank you for your selfless help again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1096784]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2014-10-21 22:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (112 votes), past polls