http://www.perlmonks.org?node_id=529442

senik148 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, im having problems parsing this XML Document.
I parsed some other documents before but i cannot parse this one correctly.

I want to pull all the news at one run.
forexample the feed containts 5 news(noticias).

and also i want to pull the 5 foto(photo).

the elemts are: pie and fotogrande only.
pie= caption for photo fotogrande = photo

a loop is no problem as long as i get this elements into some sort of scalar variable.
this the current code i have:

#!/usr/bin/perl -w use strict; use warnings; use XML::Simple; use LWP::Simple; # initialize object # get RSS data my $raw = get('http://www.fremontvirtualoffice.com/rmmame-dep.xml'); # create object my $xml = new XML::Simple; my $data = $xml->XMLin($raw); my @items = @{$data->{noticia}}; for my $hash ( @items ) { my $title = $hash->{titulo}; my $fulltext = $hash->{texto}; my $date = $hash->{fecha}; my $time = $hash->{hora}; print "TITLE: $title\n"; print "ARTICLE: $fulltext\n"; print "DATE: $date\n"; print "TIME: $time\n"; }


here is the xml file:

http://www.fremontvirtualoffice.com/rmmame-dep.xml

i want to send them by SMTP using perl. i already have the smtp code working. at one run i want to parse all news. and one by one send them using Net::SMTP;



I know my code is sloppy.. i just started working on this..
there is more than one way to do it!
Thanx

Replies are listed 'Best First'.
Re: XML Parse, Spanish Elements
by misterb101 (Sexton) on Feb 10, 2006 at 20:55 UTC
    Hi Senik,
    Personnally I'm not such a big fan of XML::Simple since the parsing into different types is very unconfortable. For example if a node contains one subnode, the subnode is a hash element in the node itself, with nore children it becomes an array..this is just an example, but is just to show that the parsetree is very difficult to navigate.
    Myself I rather use XML::XPath. Here you can use xpath queries to navigate to the nodes you want and the result is very straightforward.
    for you this would become something like :
    use XML::XPath; use XML::XPath::XMLParser; my $xp = XML::XPath->new(filename => 'test.xhtml'); my $nodeset = $xp->find('/noticias/noticia'); foreach my $noticia ($nodeset->get_nodelist) { my ($title,$fulltext,$date,$time) = ( $noticia->find('titulo')->string_value, $noticia->find('texto')->string_value, $noticia->find('fecha')->string_value, $noticia->find('hora')->string_value ) }
    Hope this helps you, it saved my life more than once :)
    --
    Cheers,
    Rob
      I'm not such a big fan of XML::Simple since the parsing into different types is very unconfortable. For example if a node contains one subnode, the subnode is a hash element in the node itself,

      You can use use the argument ForeArray to overcome that problem:
      my $xs = new XML::Simple(ForceArray => 1); Now all elements are read as an array.

      "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
        Hi jbrugger, Does that also mean that it will return an array of arrays of arrays....?
        The downside of something like that is that you would have to use $xml->[3]->[3]->[1]->[2] structures to reference nodes from your XML file. So you have to use you knowledge about the structure of the XML file to be able to reference nodes. That feels unconfortable to me at least.
        The nice thing about XML::XPath is that you can use full xpath expressions like /document//element_somwhere[@att="val"]/../next-sibbling or anything like that.

        before this starts sounding like a sales-pitch for XML::XPath, I think users should just use what feels most confortable.
        --
        Cheers,
        Rob
Re: XML Parse, Spanish Elements
by serf (Chaplain) on Feb 10, 2006 at 20:12 UTC
    Firstly I would recommend adding:
    use strict; use warnings;
    to the top of your code and then go through and fix the errors it identifies.

    You should normally use strict & warnings in all scripts unless they're one-liners or if there's a good reason not to.

    Use strict would show you that these two scalars are not being defined:

    Global symbol "$caption" requires explicit package name at ./xml_p.pl +line 87. Global symbol "$photo" requires explicit package name at ./xml_p.pl li +ne 88.
    (your line numbers may vary)

    Update: wow! it's been stripped down! well done, much easier to debug for you!

Re: XML Parse, Spanish Elements
by planetscape (Chancellor) on Feb 11, 2006 at 07:50 UTC
      i will make sure i experiment with those other Modules.

      For now i fixed everything!
      i found the way.

      #!/usr/bin/perl -w use strict; use warnings; use XML::Simple; use LWP::Simple; # get RSS data my $raw = get('http://www.fremontvirtualoffice.com/rmmame-dep.xml'); # create object my $xml = new XML::Simple; my $data = $xml->XMLin($raw); my @items = @{$data->{noticia}}; for my $hash ( @items ) { my $title = $hash->{titulo}; my $article = $hash->{texto}; my $date = $hash->{fecha}; my $time = $hash->{hora}; my $caption = $hash->{foto}->{pie}; my $photo_big = $hash->{foto}->{fotogrande}; my $photo_small = $hash->{foto}->{fotomedia}; print "TITLE: $title\n"; print "DATE: $date\n"; print "TIME: $time\n"; print "ARTICLE: $article\n"; print "PHOTO: $photo_big\n"; print "CAPTION: $caption\n"; }


      I get all the Elements i need. Thank you all!