http://www.perlmonks.org?node_id=1038255


in reply to One more parsing ATOM question

With the other problems solved, let's look at the rules.

1. The _default => sub {$_[0] => $_[1]->{_content}}, is better written as _default => 'content',. There are several builtin rules so if that you need to do with a tag matches one of them, it's better to use the builtin instead of a custom rule.

2. If the rule for a tag matches the _default rule, you don't need the tag-specific rule.

3. The rule for the <updated> tag should be updated => sub {print "$_[1]->{_content}\n";},. You want to print the contents of the tag, not the value of its (nonexistent) attribute named "updated".

4. Once you need to work with more of the data you'll probably replace the rule for <updated> with the builtin rule "content" and specify a custom rule for the <entry> tag. In that rule all the contents of the child tags will be available in the $_[1] hashref as $_[1]->{childtagname}.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^2: One more parsing ATOM question
by mcoblentz (Scribe) on Jun 14, 2013 at 23:06 UTC

    Jenda,

    Thanks for your suggestions on the rules. I've gotten to the point where I'm trying to extract values out of the CDATA field. I've tried a lot of different ideas (HTML tables, simple HTML extraction, stripping tags, RegExp, etc.) but I think that using the ::Rules engine would simply be the most straightforward. I've read your CPAN writeup on ::Rules (are you the author? Very cool) and studied but I'm not quite sure how to best proceed.

    I can extract the CDATA content and end up with a resultant set of tags and values. Your comment leads me to believe that I can create a hash of the tags and values then pick the ones I want. That seems to be the exact discussion in the ::Rules section about addresses, streets, Larry Wall, multiple tags and hashrefs. But I don't understand the discussion in that section, can you expand further?

    Your XML::Rules section, quoted below, would seem to be the relevant part.

    our %states = ( AL => 'Alabama', AK => 'Alaska', ... ); ... state => sub {return 'state' => $states{$_[1]->{_content}}; } or address => sub { if (exists $_[1]->{id}) { $sthFetchAddress->execute($_[1]->{id}); my $addr = $sthFetchAddress->fetchrow_hashref( +); $sthFetchAddress->finish(); return 'address' => $addr; } else { return 'address' => $_[1]; } }

      In XML, these two are equivalent: <foo>&lt;bar/&gt;</foo> and <foo><![CDATA[<bar/>]]></foo>. Thus the content of the <summary> tag is the "<p class="quicksummary"><a href="http://earthquake.usgs...". If you want to split that into pieces you have to pass that string to another HTML or XML parser. It's like a box that, apart from other things, contains another box so after you've opened the outer box, you have to extract the inner box and open it as well.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.