Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How to return two and more values by parsing XML with XML::Rules?

by vagabonding electron (Curate)
on Nov 06, 2012 at 09:45 UTC ( [id://1002448]=perlquestion: print w/replies, xml ) Need Help??

vagabonding electron has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I use XML::Rules to parse a huge XML document. As a Perl amateur I'm generally happy with the module since it does what I mean. Now however I'm stuck with the following problem.
Here is the XML chunk.
<Outpatient_Services> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM01</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM01</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM02</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Capacities_Outpatient_Clinic> <Care_Point> <VA_VU_Key_Outpatient_Clinic>VA01</VA_VU_Key_Outpatient_Clinic> </Care_Point> <Care_Point> <Other> <VA_VU_Other_Key_Outpatient_Clinic>VA00</VA_VU_Other_Key_Outpatient_Cl +inic> <Description>Other Care Point of the Outpatient Clinic</Description> </Other> </Care_Point> </Capacities_Outpatient_Clinic> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic_Special> <AM_Special_Key>AM06</AM_Special_Key> <Description>Description of the Outpatient_Clinic</Description> <Capacities_Outpatient_Clinic_Special> <Capacity> <LK_Key>LK01</LK_Key> </Capacity> <Capacity> <LK_Key>LK02</LK_Key> </Capacity> </Capacities_Outpatient_Clinic_Special> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic_Special> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM04</AM_Key> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <Other> <AM_Other_Key>AM00</AM_Other_Key> <Type>Type of the other Outpatient Clinic</Type> </Other> <Description>Description of the Outpatient Clinic</Description> <Capacities_Outpatient_Clinic> <Care_Point> <VA_VU_Key_Outpatient_Clinic>VA02</VA_VU_Key_Outpatient_Clinic> </Care_Point> <Care_Point> <Other> <VA_VU_Other_Key_Outpatient_Clinic>VA00</VA_VU_Other_Key_Outpatient_Cl +inic> <Description>Other Care Point of the Outpatient Clinic</Description> </Other> </Care_Point> </Capacities_Outpatient_Clinic> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> </Outpatient_Services>

Update:The desired output presentation was not correct in the original, you can see it in the spoiler below.
The output ought to be the following:
- Outpatient_Services: AM11: 1 AM07: 1 AM04: 1
However I need this not only for the AM-Keys but also for the LK-Keys and that is what I cannot achieve since there are two of them, not one.
The following code chunk is a way out so that I at least get an info about the "Outpatient_Clinic_Special", but it is just a description, not the keys.
'Outpatient_Service' => sub { if (exists $_[1]->{Outpatient_Clinic}) { if ( exists $_[1]->{Outpatient_Clinic}->{Other} ) { return $_[1]->{Outpatient_Clinic}->{Description} => 1 } else { return $_[1]->{Outpatient_Clinic}->{AM_Key} => 1 } } elsif ( exists $_[1]->{Outpatient_Clinic_Special} ) { return $_[1]->{Outpatient_Clinic_Special}->{Description} => 1; + # as a way out. } else { } },
Could you please give me a hint?
Thanks in advance!
VE

Replies are listed 'Best First'.
Re: How to return two and more values by parsing XML with XML::Rules?
by runrig (Abbot) on Nov 06, 2012 at 23:01 UTC
    Sometimes its simpler to use variables in an outer scope than to go through contortions:
    my (@am_keys, @lk_keys); my @rules = ( AM_Key => sub { push @am_keys, $_[1]{_content}; return }, LK_Key => sub { push @lk_keys, $_[1]{_content}; return }, _default => undef, ); my $xr = XML::Rules->new( rules => \@rules );

      Or, if you really are against using globals, use the $parser->{pad} to hold your data:

      my @rules = ( AM_Key => sub { push @{$_[4]->{pad}{am_keys}}, $_[1]{_content}; retu +rn }, LK_Key => sub { push @{$_[4]->{pad}{lk_keys}}, $_[1]{_content}; retu +rn }, Outpatient_Services => sub { LK_Keys => $_[4]->{pad}{lk_keys}, AM_Ke +ys => $_[4]->{pad}{am_keys}}, _default => undef, ); my $xr = XML::Rules->new( rules => \@rules );

      With this $xr->parse() returns a HoA containing the array of the AM_Keys and the array of LK_Keys.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

      Thank you very much for this!
Re: How to return two and more values by parsing XML with XML::Rules?
by Anonymous Monk on Nov 06, 2012 at 10:11 UTC
    Your sample xml doesn't match your wanted data -- irritating
      Sorry, I did not mention that I have run
      'Capacities_Outpatient_Clinic_Special' => 'pass',
      before the mentioned chunk of the script.
      Thank you for pointing that out.
      Apart from that this is a fragment of a script that actually runs (only with fields translated into English). The above rule should not affect the code fragment however, perhaps I overook another mismatch?
      Update
      It must be the result presentation that has to be adjusted in my fragment - it should be:
      - Outpatient_Services: AM11: 1 AM07: 1 AM04: 1
        I was referring to the yaml, and FWIW, ysh doesn't like that YAML, which I assume is supposed to be

        This is what I came up with, which took waaay too long, the rules are hard to remember

        #!/usr/bin/perl -- use strict; use warnings; use XML::Rules; use Data::Dump qw/ dd /; my $ta = XML::Rules->new( qw/ stripspaces 8 /, rules => { 'Outpatient_Services' => 'no content', 'Outpatient_Service' => 'as array no content', #~ 'Outpatient_Clinic' => 'content by AM_Key', 'Outpatient_Clinic' => sub { #~ $rule->( $tag_name, \%attrs, \@context, \@parent_data, $parser) #~ my ($tagname, $attrHash, $contexArray, $parentDataArray, $parser) = + @_; my $amk = $_[1]->{AM_Key} ; return unless $amk; { $amk => 1 }; }, #~ _default => sub { $_[0] => $_[1]->{_content} }, _default => 'content', 'Outpatient_Clinic_Special' => undef, }, ); my $ref = $ta->parsefile( 'pm1002448.xml' ); dd $ref; use YAML(); print YAML::Dump( $ref); __END__ { Outpatient_Services => { Outpatient_Service => [ { AM01 => 1 }, { AM01 => 1 }, { AM02 => 1 }, {}, { AM04 => 1 }, {}, ], }, } --- Outpatient_Services: Outpatient_Service: - AM01: 1 - AM01: 1 - AM02: 1 - {} - AM04: 1 - {}
Re: How to return two and more values by parsing XML with XML::Rules?
by Anonymous Monk on Nov 06, 2012 at 11:37 UTC

    XML::Twig is much easier on the noggin

    { use strict; use warnings; use Data::Dump qw/ dd /; use XML::Twig ; my( %os, @amk ); XML::Twig->new( twig_handlers => { #~ '/Outpatient_Services/Outpatient_Service/Outpatient_Clinic/ +AM_Key' => sub { 'AM_Key' => sub { print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'Outpatient_Service' => sub { print $_->xpath, "\n"; $os{ shift @amk }++ while @amk; }, }, )->xparse( 'pm1002448.xml' ); my $ref = { Outpatient_Services => \%os, }; dd $ref; use YAML(); print YAML::Dump( $ref); } __END__ /Outpatient_Services/Outpatient_Service/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service /Outpatient_Services/Outpatient_Service[2]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[2] /Outpatient_Services/Outpatient_Service[3]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[3] /Outpatient_Services/Outpatient_Service[4] /Outpatient_Services/Outpatient_Service[5]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[5] /Outpatient_Services/Outpatient_Service[6] { Outpatient_Services => { AM01 => 2, AM02 => 1, AM04 => 1 } } --- Outpatient_Services: AM01: 2 AM02: 1 AM04: 1
      Wow thanks!
      It works with LK_Keys as well:
      #!/usr/bin/perl use strict; use warnings; use Data::Dump qw/ dd /; use XML::Twig ; my( %os, @amk ); XML::Twig->new( twig_handlers => { 'AM_Key' => sub { # print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'LK_Key' => sub { # print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'Outpatient_Service' => sub { # print $_->xpath, "\n"; $os{ shift @amk }++ while @amk; }, }, )->xparse( shift ); my $ref = { Outpatient_Services => \%os, }; # dd $ref; use YAML::XS(); print YAML::XS::Dump( $ref);
      prints:
      Outpatient_Services: AM01: 2 AM02: 1 AM04: 1 LK01: 1 LK02: 1
Re: How to return two and more values by parsing XML with XML::Rules?
by vagabonding electron (Curate) on Nov 06, 2012 at 14:31 UTC
    After some thoughts and readings I was finally able to produce the desired output with XML::Rules.
    #!/usr/bin/perl use strict; use warnings; use XML::Rules; use YAML::XS; my $parser = XML::Rules->new( rules => { 'Capacities_Outpatient_Clinic, Other, Outpatient_Clinic, Outpatient_Clinic_Special, Outpatient_Services' => 'no content', 'Capacities_Outpatient_Clinic_Special' => 'pass', 'AM_Key, AM_Other_Key, AM_Special_Key, Description, Explanations, LK_Key, Type, VA_VU_Key_Outpatient_Clinic, VA_VU_Other_Key_Outpatient_Clinic' => 'content', 'Care_Point' => 'as array no content', 'Capacity' => sub {$_[1]->{LK_Key} => 1}, 'Outpatient_Service' => sub { if (exists $_[1]->{Outpatient_Clinic}) { if ( exists $_[1]->{Outpatient_Clinic}->{Other} ) { return $_[1]->{Outpatient_Clinic}->{Description} => 1 } else { return $_[1]->{Outpatient_Clinic}->{AM_Key} => 1 } } elsif ( exists $_[1]->{Outpatient_Clinic_Special} ) { my $h; for ( keys %{ $_[1]->{Outpatient_Clinic_Special} } ) { $h->{$_} = 1 if /LK\d*/; } return %$h; } else { } }, } ); my $data = $parser->parsefile(shift); print Dump $data;
    which prints
    --- Outpatient_Services: AM01: 1 AM02: 1 AM04: 1 Description of the Outpatient Clinic: 1 LK01: 1 LK02: 1
    from the posted xml fragment.
    What I still do not know is whether this is a proper use of the module or a side way.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1002448]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-03-19 06:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found