Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: XML::Parser - Usage of &

by tobyink (Abbot)
on Feb 20, 2013 at 11:10 UTC ( #1019761=note: print w/ replies, xml ) Need Help??


in reply to XML::Parser - Usage of &

XML::Parser shouldn't be ignoring the Company A&; I think what you'll find is that it treats the title as three pieces of character data:

  1. Company A
  2. &
  3. B Information

And it will treat these as three separate parse events. Quick demonstration:

use 5.010; use strict; use warnings; use XML::Parser; my $in_title; my $parser = XML::Parser->new( Handlers => { Start => sub { $in_title++ if $_[1] eq 'Title' }, End => sub { $in_title-- if $_[1] eq 'Title' }, Char => sub { say "CHAR: $_[1]" if $in_title }, }, ); $parser->parse(<<'XML'); <Document> <Title>Company A&amp;B Information</Title> <Abstract>Foo</Abstract> </Document> XML

XML::Parser is very bare-bones, and sees the job of translating those parse events into a useful data structure as being very much your job.

Personally I prefer DOM-based XML parsers, such as XML::LibXML which parse the entire file into a tree and allow you to manipulate and navigate that tree using the same DOM interface which web browsers expose to Javascript.

package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name


Comment on Re: XML::Parser - Usage of &amp;
Select or Download Code
Re^2: XML::Parser - Usage of &amp;
by sumeetgrover (Scribe) on Feb 20, 2013 at 11:19 UTC

    You are right, the parser is indeed treating the title as:

    1. Company A 2. & 3. B Information

    Therefore, does it mean that our code needs to have the ability to put all these three pieces together and save as one single title?

    Many thanks for your help!

      I'm guessing that right now the code (you haven't posted any, so the best I can do is guess!) in the Char handler is saving a reference to the last bit of character data, and then when the End handler sees the end of the Title element, it does something with that. Maybe something like this:

      use 5.010; use strict; use warnings; use XML::Parser; my ($got_title, $in_title); my $parser = XML::Parser->new( Handlers => { Start => sub { $in_title++ if $_[1] eq 'Title' }, End => sub { $in_title--, say "GOT TITLE: $got_title" if $_[ +1] eq 'Title' }, Char => sub { $got_title = $_[1] if $in_title }, }, ); $parser->parse(<<'XML'); <Document> <Title>Company A&amp;B Information</Title> <Abstract>Foo</Abstract> <Title>Company X&amp;Y Information</Title> <Abstract>Bar</Abstract> </Document> XML

      Instead you want the Char handler to accumulate the pieces of character data using either string appending, or pushing onto an array/arrayref, then use the Start and End handlers to signal when to start and end accumulating character data. For example:

      use 5.010; use strict; use warnings; use XML::Parser; my (@got_title, $in_title); my $parser = XML::Parser->new( Handlers => { Start => sub { $in_title++, @got_title = () if $_[1] eq 'Title +' }, End => sub { $in_title--, say "GOT TITLE: @got_title" if $_[ +1] eq 'Title'; }, Char => sub { push @got_title, $_[1] if $in_title }, }, ); $parser->parse(<<'XML'); <Document> <Title>Company A&amp;B Information</Title> <Abstract>Foo</Abstract> <Title>Company X&amp;Y Information</Title> <Abstract>Bar</Abstract> </Document> XML
      package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

        Thank you! This is exactly the type of bug fix I will be implementing in my code. All makes sense now.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1019761]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2014-07-31 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (244 votes), past polls