Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Behaviour of parsed XML

by dalgetty (Acolyte)
on Feb 26, 2020 at 11:08 UTC ( #11113429=perlquestion: print w/replies, xml ) Need Help??

dalgetty has asked for the wisdom of the Perl Monks concerning the following question:

Dear Brethren,

I know that Perl is not inconsistent, so it must be me.

For years I have been using XML::Simple to parse several RSS feeds, and have 1500 lines of code running nicely, except for the rare occasion when there is only one item in the RSS feed.

In this case the script fails, because $data->{channel}->{item}->[0] does not exist. Since there is only one entry, XML::Simple does not create {item}->[0], but puts the hash table straight into $data->{channel}.

So I adjust the hash table as follows, and I can access the information I need:
$mydata=$data->{channel}->{item}; my $data->{channel}->{item}->[0]=$mydata; if ($data->{channel}->{item}->[$y]) { while (($data->{channel}->{item}->[$y])&&($y>-1)) { $keyword=$data->{channel}->{item}->[$y]->{epfl_keywords}; ...
The data is then correctly constructed:
$VAR1 = { 'channel' => { 'item' => [ { 'epfl_is_internal' => 'False', +'link' => ' +st-cubesat/', 'epfl_organizer' => 'eSpace ', 'pubDate' => 'Mon, 16 Ma +r 2020 14:00:00 +0100', 'description' => "Incl ...
However, this code needs to run when there are several items in the RSS feed too, so I only want to apply the above operation in cases of one item. In order to test for this I use the following code:
if (exists($data->{channel}->{title})) { $mydata=$data->{channel}->{item}; my $data->{channel}->{item}->[0]=$mydata; print "Only one item is present"; } if ($data->{channel}->{item}->[$y]) { while (($data->{channel}->{item}->[$y])&&($y>-1)) { $keyword=$data->{channel}->{item}->[$y]->{epfl_keywords}; ...

"title" is one of many keys that always exists in the RSS feed entries. If it exists directly within "channel" that means that there is only one RSS entry in the feed, and the message prints out "Only one item is present". So far, so good.

However, I then get an error: "Not an ARRAY reference" for the second "if" statement in line 6, as if the restructuring had not happened.

This seemed strange to me, since the single entry case had clearly been identified correctly. So I tried the following:

if (1) { $mydata=$data->{channel}->{item}; my $data->{channel}->{item}->[0]=$mydata; } if ($data->{channel}->{item}->[$y]) { while (($data->{channel}->{item}->[$y])&&($y>-1)) { $keyword=$data->{channel}->{item}->[$y]->{epfl_keywords}; ...

I fully expected this code to run smoothly, like the first attempt did. Of course, I would never have attempted to make Perl look inconsistent. But I get the "not an array" error again.

All I am doing is testing for something, not changing anything. But just making an if(1) statement is enough to stop my code working correctly. What is even more confusing is that a Dumper print of the data shows that the data is correctly structured, as in the first statement, whether I apply the if statement or not.

Can any of you please tell me where my inconsistency lies? Thanks to all

Replies are listed 'Best First'.
Re: Behaviour of parsed XML
by haukex (Chancellor) on Feb 26, 2020 at 11:22 UTC

    An SSCCE would be helpful here; for example we don't know if you're using the ForceArray option. In any case, XML::Simple is extremely brittle (see the warning at the top of its documentation) and I would very strongly recommend trying to move away from it. I've shown how to use XML::Rules as a replacement several times, see this node and the links therein.

Re: Behaviour of parsed XML
by cavac (Curate) on Feb 26, 2020 at 14:05 UTC


    my $xml = XMLin($filename, ForceArray => ['item'];

    perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'



      That just worked. Like a turnaround jumpshot. Thanks to everybody for your help, it's all good now...

        It might solve this particular issue, but in general moving away from XML::Simple will serve you better long term. The module is conceptually broken. It treats XML as being basically weird-looking JSON. But XML's data model is entirely different than JSON's. Not going to get into a debate about which data model is better, but treating them the same is like writing flight-planning software that will only generate flight routes that follow highways.

        Somebody else recommended XML::Rules, which is probably the easiest thing to transition to from XML::Simple. It still allows you to treat the XML like glorified JSON, but requires you to give it rules about how to translate between the models, with some sane defaults.

        The other thing I'd recommend looking at is XML::LibXML, which provides a full DOM API for XML, which will be pretty familiar if you've done any client-side Javascript programming.

Variable scope issue- effect of "my" on hash ref keys, values
by parv (Vicar) on Feb 26, 2020 at 11:29 UTC

    I hate the software that does not keep the return value consistent. Short of asking a parser for an array reference (somehow) regardless of number of items present, however, I do not see a parser to always return an array reference in the situation described. With that out of the way ...

    Your problem seems to be of variable scope. Why do you have "my" in ...

    if (exists($data->{channel}->{title})) { ... my $data->{channel}->{item}->[0]=$mydata; ... }

    ...? After the end of the if-block, $data->{channel}->{item}->[0] ceases to exist due to my operator.

    After tobyink & haukex had set me straight, please ignore the above code; try this ...

    use strict; use warnings; use Data::Dumper; my $x; { my $x->{a}->[0] = "no you don't!"; } { $x->{b}->[0] = "now you see"; } print Dumper $x; __END__ $VAR1 = { 'b' => [ 'now you see' ] };

      Why would the "extra fun" bit be in the output? You declared two separate $x variables and they're each references to different hashes. If you put data into one hash, it shouldn't appear in the other one.

      The only arguable improvement which could be made to Perl with regard to this, is that my $x->{a}->[0] is kinda weird and might be worth warning about. Really that's just a precedence thing though. my is higher precedence than the deref operator, so it just means (my $x)->{a}->[0].

        Ha! Right you are. I had missed the fact a spanking new variable was created inside the block (when assigning to the hash reference keys). No "extra fun" for me, it's obvious now.

        I was mistakenly thinking that my operator was affecting the existence of keys, values (in that {a} element was autovivified and only [0] element was localized to the block) but not the $x created earlier outside the block. Much thanks to you both for the clue bat.

      I am disturbed by the fact that "extra fun" does not make into the Dumper output

      I'm pretty sure that's because my $x->{a}->[0] = "no you don't!"; is mostly equivalent to my $x; $x->{a}->[0] = "no you don't!";. From Private Variables via my():

      All listed elements must be legal lvalues. ... The my is simply a modifier on something you might assign to.

      In perl 5.30.1, I am disturbed by the fact that "extra fun" does not make into the Dumper output (but not disturbed enough to file a problem report myself).

      Not 100% sure what you mean, but to me that makes sense. You still have my $x->... declared in the scope of the first block the next statement just adds to that locally scoped var inside that same block.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11113429]
Approved by marto
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2020-04-01 09:32 GMT
Find Nodes?
    Voting Booth?
    To "Disagree to disagree" means to:

    Results (187 votes). Check out past polls.