http://www.perlmonks.org?node_id=1002448

vagabonding electron has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I use XML::Rules to parse a huge XML document. As a Perl amateur I'm generally happy with the module since it does what I mean. Now however I'm stuck with the following problem.
Here is the XML chunk.
<Outpatient_Services> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM01</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM01</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM02</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Capacities_Outpatient_Clinic> <Care_Point> <VA_VU_Key_Outpatient_Clinic>VA01</VA_VU_Key_Outpatient_Clinic> </Care_Point> <Care_Point> <Other> <VA_VU_Other_Key_Outpatient_Clinic>VA00</VA_VU_Other_Key_Outpatient_Cl +inic> <Description>Other Care Point of the Outpatient Clinic</Description> </Other> </Care_Point> </Capacities_Outpatient_Clinic> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic_Special> <AM_Special_Key>AM06</AM_Special_Key> <Description>Description of the Outpatient_Clinic</Description> <Capacities_Outpatient_Clinic_Special> <Capacity> <LK_Key>LK01</LK_Key> </Capacity> <Capacity> <LK_Key>LK02</LK_Key> </Capacity> </Capacities_Outpatient_Clinic_Special> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic_Special> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM04</AM_Key> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <Other> <AM_Other_Key>AM00</AM_Other_Key> <Type>Type of the other Outpatient Clinic</Type> </Other> <Description>Description of the Outpatient Clinic</Description> <Capacities_Outpatient_Clinic> <Care_Point> <VA_VU_Key_Outpatient_Clinic>VA02</VA_VU_Key_Outpatient_Clinic> </Care_Point> <Care_Point> <Other> <VA_VU_Other_Key_Outpatient_Clinic>VA00</VA_VU_Other_Key_Outpatient_Cl +inic> <Description>Other Care Point of the Outpatient Clinic</Description> </Other> </Care_Point> </Capacities_Outpatient_Clinic> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> </Outpatient_Services>

Update:The desired output presentation was not correct in the original, you can see it in the spoiler below.
The output ought to be the following:
- Outpatient_Services: AM11: 1 AM07: 1 AM04: 1
However I need this not only for the AM-Keys but also for the LK-Keys and that is what I cannot achieve since there are two of them, not one.
The following code chunk is a way out so that I at least get an info about the "Outpatient_Clinic_Special", but it is just a description, not the keys.
'Outpatient_Service' => sub { if (exists $_[1]->{Outpatient_Clinic}) { if ( exists $_[1]->{Outpatient_Clinic}->{Other} ) { return $_[1]->{Outpatient_Clinic}->{Description} => 1 } else { return $_[1]->{Outpatient_Clinic}->{AM_Key} => 1 } } elsif ( exists $_[1]->{Outpatient_Clinic_Special} ) { return $_[1]->{Outpatient_Clinic_Special}->{Description} => 1; + # as a way out. } else { } },
Could you please give me a hint?
Thanks in advance!
VE

Replies are listed 'Best First'.
Re: How to return two and more values by parsing XML with XML::Rules?
by runrig (Abbot) on Nov 06, 2012 at 23:01 UTC
    Sometimes its simpler to use variables in an outer scope than to go through contortions:
    my (@am_keys, @lk_keys); my @rules = ( AM_Key => sub { push @am_keys, $_[1]{_content}; return }, LK_Key => sub { push @lk_keys, $_[1]{_content}; return }, _default => undef, ); my $xr = XML::Rules->new( rules => \@rules );

      Or, if you really are against using globals, use the $parser->{pad} to hold your data:

      my @rules = ( AM_Key => sub { push @{$_[4]->{pad}{am_keys}}, $_[1]{_content}; retu +rn }, LK_Key => sub { push @{$_[4]->{pad}{lk_keys}}, $_[1]{_content}; retu +rn }, Outpatient_Services => sub { LK_Keys => $_[4]->{pad}{lk_keys}, AM_Ke +ys => $_[4]->{pad}{am_keys}}, _default => undef, ); my $xr = XML::Rules->new( rules => \@rules );

      With this $xr->parse() returns a HoA containing the array of the AM_Keys and the array of LK_Keys.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

      Thank you very much for this!
Re: How to return two and more values by parsing XML with XML::Rules?
by Anonymous Monk on Nov 06, 2012 at 10:11 UTC
    Your sample xml doesn't match your wanted data -- irritating
      Sorry, I did not mention that I have run
      'Capacities_Outpatient_Clinic_Special' => 'pass',
      before the mentioned chunk of the script.
      Thank you for pointing that out.
      Apart from that this is a fragment of a script that actually runs (only with fields translated into English). The above rule should not affect the code fragment however, perhaps I overook another mismatch?
      Update
      It must be the result presentation that has to be adjusted in my fragment - it should be:
      - Outpatient_Services: AM11: 1 AM07: 1 AM04: 1
        I was referring to the yaml, and FWIW, ysh doesn't like that YAML, which I assume is supposed to be

        This is what I came up with, which took waaay too long, the rules are hard to remember

        #!/usr/bin/perl -- use strict; use warnings; use XML::Rules; use Data::Dump qw/ dd /; my $ta = XML::Rules->new( qw/ stripspaces 8 /, rules => { 'Outpatient_Services' => 'no content', 'Outpatient_Service' => 'as array no content', #~ 'Outpatient_Clinic' => 'content by AM_Key', 'Outpatient_Clinic' => sub { #~ $rule->( $tag_name, \%attrs, \@context, \@parent_data, $parser) #~ my ($tagname, $attrHash, $contexArray, $parentDataArray, $parser) = + @_; my $amk = $_[1]->{AM_Key} ; return unless $amk; { $amk => 1 }; }, #~ _default => sub { $_[0] => $_[1]->{_content} }, _default => 'content', 'Outpatient_Clinic_Special' => undef, }, ); my $ref = $ta->parsefile( 'pm1002448.xml' ); dd $ref; use YAML(); print YAML::Dump( $ref); __END__ { Outpatient_Services => { Outpatient_Service => [ { AM01 => 1 }, { AM01 => 1 }, { AM02 => 1 }, {}, { AM04 => 1 }, {}, ], }, } --- Outpatient_Services: Outpatient_Service: - AM01: 1 - AM01: 1 - AM02: 1 - {} - AM04: 1 - {}
Re: How to return two and more values by parsing XML with XML::Rules?
by Anonymous Monk on Nov 06, 2012 at 11:37 UTC

    XML::Twig is much easier on the noggin

    { use strict; use warnings; use Data::Dump qw/ dd /; use XML::Twig ; my( %os, @amk ); XML::Twig->new( twig_handlers => { #~ '/Outpatient_Services/Outpatient_Service/Outpatient_Clinic/ +AM_Key' => sub { 'AM_Key' => sub { print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'Outpatient_Service' => sub { print $_->xpath, "\n"; $os{ shift @amk }++ while @amk; }, }, )->xparse( 'pm1002448.xml' ); my $ref = { Outpatient_Services => \%os, }; dd $ref; use YAML(); print YAML::Dump( $ref); } __END__ /Outpatient_Services/Outpatient_Service/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service /Outpatient_Services/Outpatient_Service[2]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[2] /Outpatient_Services/Outpatient_Service[3]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[3] /Outpatient_Services/Outpatient_Service[4] /Outpatient_Services/Outpatient_Service[5]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[5] /Outpatient_Services/Outpatient_Service[6] { Outpatient_Services => { AM01 => 2, AM02 => 1, AM04 => 1 } } --- Outpatient_Services: AM01: 2 AM02: 1 AM04: 1
      Wow thanks!
      It works with LK_Keys as well:
      #!/usr/bin/perl use strict; use warnings; use Data::Dump qw/ dd /; use XML::Twig ; my( %os, @amk ); XML::Twig->new( twig_handlers => { 'AM_Key' => sub { # print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'LK_Key' => sub { # print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'Outpatient_Service' => sub { # print $_->xpath, "\n"; $os{ shift @amk }++ while @amk; }, }, )->xparse( shift ); my $ref = { Outpatient_Services => \%os, }; # dd $ref; use YAML::XS(); print YAML::XS::Dump( $ref);
      prints:
      Outpatient_Services: AM01: 2 AM02: 1 AM04: 1 LK01: 1 LK02: 1
Re: How to return two and more values by parsing XML with XML::Rules?
by vagabonding electron (Curate) on Nov 06, 2012 at 14:31 UTC
    After some thoughts and readings I was finally able to produce the desired output with XML::Rules.
    #!/usr/bin/perl use strict; use warnings; use XML::Rules; use YAML::XS; my $parser = XML::Rules->new( rules => { 'Capacities_Outpatient_Clinic, Other, Outpatient_Clinic, Outpatient_Clinic_Special, Outpatient_Services' => 'no content', 'Capacities_Outpatient_Clinic_Special' => 'pass', 'AM_Key, AM_Other_Key, AM_Special_Key, Description, Explanations, LK_Key, Type, VA_VU_Key_Outpatient_Clinic, VA_VU_Other_Key_Outpatient_Clinic' => 'content', 'Care_Point' => 'as array no content', 'Capacity' => sub {$_[1]->{LK_Key} => 1}, 'Outpatient_Service' => sub { if (exists $_[1]->{Outpatient_Clinic}) { if ( exists $_[1]->{Outpatient_Clinic}->{Other} ) { return $_[1]->{Outpatient_Clinic}->{Description} => 1 } else { return $_[1]->{Outpatient_Clinic}->{AM_Key} => 1 } } elsif ( exists $_[1]->{Outpatient_Clinic_Special} ) { my $h; for ( keys %{ $_[1]->{Outpatient_Clinic_Special} } ) { $h->{$_} = 1 if /LK\d*/; } return %$h; } else { } }, } ); my $data = $parser->parsefile(shift); print Dump $data;
    which prints
    --- Outpatient_Services: AM01: 1 AM02: 1 AM04: 1 Description of the Outpatient Clinic: 1 LK01: 1 LK02: 1
    from the posted xml fragment.
    What I still do not know is whether this is a proper use of the module or a side way.