Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

How to return two and more values by parsing XML with XML::Rules?

by vagabonding electron (Hermit)
on Nov 06, 2012 at 09:45 UTC ( #1002448=perlquestion: print w/ replies, xml ) Need Help??
vagabonding electron has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I use XML::Rules to parse a huge XML document. As a Perl amateur I'm generally happy with the module since it does what I mean. Now however I'm stuck with the following problem.
Here is the XML chunk.
<Outpatient_Services> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM01</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM01</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM02</AM_Key> <Description>Description of the Outpatient_Clinic</Description> <Capacities_Outpatient_Clinic> <Care_Point> <VA_VU_Key_Outpatient_Clinic>VA01</VA_VU_Key_Outpatient_Clinic> </Care_Point> <Care_Point> <Other> <VA_VU_Other_Key_Outpatient_Clinic>VA00</VA_VU_Other_Key_Outpatient_Cl +inic> <Description>Other Care Point of the Outpatient Clinic</Description> </Other> </Care_Point> </Capacities_Outpatient_Clinic> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic_Special> <AM_Special_Key>AM06</AM_Special_Key> <Description>Description of the Outpatient_Clinic</Description> <Capacities_Outpatient_Clinic_Special> <Capacity> <LK_Key>LK01</LK_Key> </Capacity> <Capacity> <LK_Key>LK02</LK_Key> </Capacity> </Capacities_Outpatient_Clinic_Special> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic_Special> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <AM_Key>AM04</AM_Key> </Outpatient_Clinic> </Outpatient_Service> <Outpatient_Service> <Outpatient_Clinic> <Other> <AM_Other_Key>AM00</AM_Other_Key> <Type>Type of the other Outpatient Clinic</Type> </Other> <Description>Description of the Outpatient Clinic</Description> <Capacities_Outpatient_Clinic> <Care_Point> <VA_VU_Key_Outpatient_Clinic>VA02</VA_VU_Key_Outpatient_Clinic> </Care_Point> <Care_Point> <Other> <VA_VU_Other_Key_Outpatient_Clinic>VA00</VA_VU_Other_Key_Outpatient_Cl +inic> <Description>Other Care Point of the Outpatient Clinic</Description> </Other> </Care_Point> </Capacities_Outpatient_Clinic> <Explanations>Explanations to the Outpatient_Clinic</Explanations> </Outpatient_Clinic> </Outpatient_Service> </Outpatient_Services>

Update:The desired output presentation was not correct in the original, you can see it in the spoiler below.
The output ought to be the following:
- Outpatient_Services: AM11: 1 AM07: 1 AM04: 1
However I need this not only for the AM-Keys but also for the LK-Keys and that is what I cannot achieve since there are two of them, not one.
The following code chunk is a way out so that I at least get an info about the "Outpatient_Clinic_Special", but it is just a description, not the keys.
'Outpatient_Service' => sub { if (exists $_[1]->{Outpatient_Clinic}) { if ( exists $_[1]->{Outpatient_Clinic}->{Other} ) { return $_[1]->{Outpatient_Clinic}->{Description} => 1 } else { return $_[1]->{Outpatient_Clinic}->{AM_Key} => 1 } } elsif ( exists $_[1]->{Outpatient_Clinic_Special} ) { return $_[1]->{Outpatient_Clinic_Special}->{Description} => 1; + # as a way out. } else { } },
Could you please give me a hint?
Thanks in advance!
VE

Comment on How to return two and more values by parsing XML with XML::Rules?
Select or Download Code
Re: How to return two and more values by parsing XML with XML::Rules?
by Anonymous Monk on Nov 06, 2012 at 10:11 UTC
    Your sample xml doesn't match your wanted data -- irritating
      Sorry, I did not mention that I have run
      'Capacities_Outpatient_Clinic_Special' => 'pass',
      before the mentioned chunk of the script.
      Thank you for pointing that out.
      Apart from that this is a fragment of a script that actually runs (only with fields translated into English). The above rule should not affect the code fragment however, perhaps I overook another mismatch?
      Update
      It must be the result presentation that has to be adjusted in my fragment - it should be:
      - Outpatient_Services: AM11: 1 AM07: 1 AM04: 1
        I was referring to the yaml, and FWIW, ysh doesn't like that YAML, which I assume is supposed to be

        ysh > --- yaml> Outpatient_Services: yaml> Outpatient_Service: yaml> - AM11: 1 yaml> - AM07: 1 yaml> - AM04: 1 yaml> ... $VAR1 = { 'Outpatient_Services' => { 'Outpatient_Service' => [ { 'AM11' => '1' }, { 'AM07' => '1' }, { 'AM04' => '1' } ] } };

        This is what I came up with, which took waaay too long, the rules are hard to remember

        #!/usr/bin/perl -- use strict; use warnings; use XML::Rules; use Data::Dump qw/ dd /; my $ta = XML::Rules->new( qw/ stripspaces 8 /, rules => { 'Outpatient_Services' => 'no content', 'Outpatient_Service' => 'as array no content', #~ 'Outpatient_Clinic' => 'content by AM_Key', 'Outpatient_Clinic' => sub { #~ $rule->( $tag_name, \%attrs, \@context, \@parent_data, $parser) #~ my ($tagname, $attrHash, $contexArray, $parentDataArray, $parser) = + @_; my $amk = $_[1]->{AM_Key} ; return unless $amk; { $amk => 1 }; }, #~ _default => sub { $_[0] => $_[1]->{_content} }, _default => 'content', 'Outpatient_Clinic_Special' => undef, }, ); my $ref = $ta->parsefile( 'pm1002448.xml' ); dd $ref; use YAML(); print YAML::Dump( $ref); __END__ { Outpatient_Services => { Outpatient_Service => [ { AM01 => 1 }, { AM01 => 1 }, { AM02 => 1 }, {}, { AM04 => 1 }, {}, ], }, } --- Outpatient_Services: Outpatient_Service: - AM01: 1 - AM01: 1 - AM02: 1 - {} - AM04: 1 - {}
Re: How to return two and more values by parsing XML with XML::Rules?
by Anonymous Monk on Nov 06, 2012 at 11:37 UTC

    XML::Twig is much easier on the noggin

    { use strict; use warnings; use Data::Dump qw/ dd /; use XML::Twig ; my( %os, @amk ); XML::Twig->new( twig_handlers => { #~ '/Outpatient_Services/Outpatient_Service/Outpatient_Clinic/ +AM_Key' => sub { 'AM_Key' => sub { print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'Outpatient_Service' => sub { print $_->xpath, "\n"; $os{ shift @amk }++ while @amk; }, }, )->xparse( 'pm1002448.xml' ); my $ref = { Outpatient_Services => \%os, }; dd $ref; use YAML(); print YAML::Dump( $ref); } __END__ /Outpatient_Services/Outpatient_Service/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service /Outpatient_Services/Outpatient_Service[2]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[2] /Outpatient_Services/Outpatient_Service[3]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[3] /Outpatient_Services/Outpatient_Service[4] /Outpatient_Services/Outpatient_Service[5]/Outpatient_Clinic/AM_Key /Outpatient_Services/Outpatient_Service[5] /Outpatient_Services/Outpatient_Service[6] { Outpatient_Services => { AM01 => 2, AM02 => 1, AM04 => 1 } } --- Outpatient_Services: AM01: 2 AM02: 1 AM04: 1
      Wow thanks!
      It works with LK_Keys as well:
      #!/usr/bin/perl use strict; use warnings; use Data::Dump qw/ dd /; use XML::Twig ; my( %os, @amk ); XML::Twig->new( twig_handlers => { 'AM_Key' => sub { # print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'LK_Key' => sub { # print $_->xpath, "\n"; push @amk, $_->trimmed_text; }, 'Outpatient_Service' => sub { # print $_->xpath, "\n"; $os{ shift @amk }++ while @amk; }, }, )->xparse( shift ); my $ref = { Outpatient_Services => \%os, }; # dd $ref; use YAML::XS(); print YAML::XS::Dump( $ref);
      prints:
      Outpatient_Services: AM01: 2 AM02: 1 AM04: 1 LK01: 1 LK02: 1
Re: How to return two and more values by parsing XML with XML::Rules?
by vagabonding electron (Hermit) on Nov 06, 2012 at 14:31 UTC
    After some thoughts and readings I was finally able to produce the desired output with XML::Rules.
    #!/usr/bin/perl use strict; use warnings; use XML::Rules; use YAML::XS; my $parser = XML::Rules->new( rules => { 'Capacities_Outpatient_Clinic, Other, Outpatient_Clinic, Outpatient_Clinic_Special, Outpatient_Services' => 'no content', 'Capacities_Outpatient_Clinic_Special' => 'pass', 'AM_Key, AM_Other_Key, AM_Special_Key, Description, Explanations, LK_Key, Type, VA_VU_Key_Outpatient_Clinic, VA_VU_Other_Key_Outpatient_Clinic' => 'content', 'Care_Point' => 'as array no content', 'Capacity' => sub {$_[1]->{LK_Key} => 1}, 'Outpatient_Service' => sub { if (exists $_[1]->{Outpatient_Clinic}) { if ( exists $_[1]->{Outpatient_Clinic}->{Other} ) { return $_[1]->{Outpatient_Clinic}->{Description} => 1 } else { return $_[1]->{Outpatient_Clinic}->{AM_Key} => 1 } } elsif ( exists $_[1]->{Outpatient_Clinic_Special} ) { my $h; for ( keys %{ $_[1]->{Outpatient_Clinic_Special} } ) { $h->{$_} = 1 if /LK\d*/; } return %$h; } else { } }, } ); my $data = $parser->parsefile(shift); print Dump $data;
    which prints
    --- Outpatient_Services: AM01: 1 AM02: 1 AM04: 1 Description of the Outpatient Clinic: 1 LK01: 1 LK02: 1
    from the posted xml fragment.
    What I still do not know is whether this is a proper use of the module or a side way.
Re: How to return two and more values by parsing XML with XML::Rules?
by runrig (Abbot) on Nov 06, 2012 at 23:01 UTC
    Sometimes its simpler to use variables in an outer scope than to go through contortions:
    my (@am_keys, @lk_keys); my @rules = ( AM_Key => sub { push @am_keys, $_[1]{_content}; return }, LK_Key => sub { push @lk_keys, $_[1]{_content}; return }, _default => undef, ); my $xr = XML::Rules->new( rules => \@rules );
      Thank you very much for this!

      Or, if you really are against using globals, use the $parser->{pad} to hold your data:

      my @rules = ( AM_Key => sub { push @{$_[4]->{pad}{am_keys}}, $_[1]{_content}; retu +rn }, LK_Key => sub { push @{$_[4]->{pad}{lk_keys}}, $_[1]{_content}; retu +rn }, Outpatient_Services => sub { LK_Keys => $_[4]->{pad}{lk_keys}, AM_Ke +ys => $_[4]->{pad}{am_keys}}, _default => undef, ); my $xr = XML::Rules->new( rules => \@rules );

      With this $xr->parse() returns a HoA containing the array of the AM_Keys and the array of LK_Keys.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002448]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2014-07-24 23:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (167 votes), past polls