Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

XML::Parser question.

by reyjrar (Hermit)
on Nov 04, 2002 at 18:55 UTC ( #210247=perlquestion: print w/replies, xml ) Need Help??
reyjrar has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I've been doing, network and database programming in perl for years now. I had a brief run in with HTML::Parser at some point in my career and it took me a _long_ time to grasp it. Now, I need to basically write a parser that follows XLinks and returns all the content in those documents to a parent document, essentially treating the XLinks as XIncludes. So, it seems simple and I'm 100% sure it is, I'm just not getting the XML Parsing modules. However, this is the 3rd different way I've tried this, and the 3rd time I've gotten this exact error mesasge from all three methods.

First I tried to use XML::SAX and write filters in a similar fashion to the way I read an article on to follow XIncludes. I get this error message:
syntax error at line 1, column 0, byte 0 at /usr/lib/perl5/vendor_perl +/5.6.1/i386-linux/XML/ line 185

so then I figure, "hey, I did it wrong, and I don't understand" So I search around some more, and find another article on about how to filter using XML::SAX::Machines. So I rewrite an implementation of my parser using XML::SAX::Machines, and alas the SAME error message.

So I spend all day debuging and get no where. I admit that I'm making things more complicated in attempting to understand the parser routines, and I thought I had a grasp of how they atleast functioned to gather information out of an xml document. I reread everything and attempt another implementation using XML::Parser.

The code follows ...
#!/usr/bin/perl use strict; use warnings; use XML::Parser; use LWP::Simple; our %XLINK; my $parser = new XML::Parser( Handlers => { Start => \&handle_start, End => \&handle_end } ); $parser->parse('test.xml'); sub handle_start { my $expat = shift; my $element = shift; my %attrs = @_; foreach my $attr (keys %attrs) { my ($ns,$elm) = split /\:/, $attr, 2; next unless $ns =~ /xmlns/i; if($attrs{$attr} eq '') { $XLINK{label} = $elm; $XLINK{element} = $element; last; } } my %link = (); if($XLINK{label}) { foreach my $attr (grep /$XLINK{label}/, keys %attrs) { my ($ns,$a) = split /\:/,$attr,2; next unless $ns eq $XLINK{label}; $link{lc $a} = $attrs{$a}; } if(exists $link{href} && $link{type} eq 'simple') { print retrieve($link{href}); } } } sub handle_end { my $expat = shift; my $element = shift; %XLINK = () if $element eq $XLINK{element}; } sub retrieve { my $url = shift; return get($url); }

here is test.xml:
<?xml version='1.0'?> <test> <remote xmlns:xlink="" xlink:type="simple" xlink:title="testing this thing" xlink:href=""> Testing this thing </remote> <local> <cat name="chunky"> <kitten>funky</kitten> <kitten>monkey</kitten> </cat> </local> </test>

I get the same error as before, and I was wondering if some one could potentially correct my thinking on this simple example that it might shine some light on my dismal XML::Parser comprehension.

much obliged,


Replies are listed 'Best First'.
Re: XML::Parser question.
by mirod (Canon) on Nov 04, 2002 at 19:06 UTC

    You should call $parser->parsefile('test.xml'); instead of parse, the parser tries to parse the text test.xml and does not find it well-formed.

Re: XML::Parser question.
by seattlejohn (Deacon) on Nov 04, 2002 at 19:07 UTC
    You're going to kick yourself ;-)

    You want to be calling the parsefile method, not the parse method. parse expects to see a string containing the XML, and of course 'test.xml' is not well-formed XML. (Or you can pass it an open IO::Handle, but not a bare filename.)

    Also, if you have further problems it can be helpful to use the Style => 'Debug' setting when you instantiate your parser object.

            $perlmonks{seattlejohn} = 'John Clyman';

Re: XML::Parser question.
by FamousLongAgo (Friar) on Nov 04, 2002 at 19:07 UTC
    Your program is trying to parse the string 'test.xml' as its XML input, and failing. Either read the test file into a string and pass that to the parser's parse method, or use parsefile instead:

Re: XML::Parser question.
by mirod (Canon) on Nov 04, 2002 at 19:32 UTC

    BTW, you don't need to do the namespace processing yourself, you can use the namespace-related methods on the XML::Parser::Expat object described in the doc for that module (perldoc XML::Parser::Expat).

    For example:

               Return the URI of the namespace that the name belongs to. 

    I would think using SAX is probably the best way to go though, you probably had a similar error in your initial code, reading the file name instead of the file.

      Thanks to everyone who pointed that out. Yes, I had a few of those parse vs parsefile erros, but they weren't the only ones. I'm learning how to do debug the code better now.. I just need something good to happen with this code. Now I have the XML::Parser stuff working, I'm gonna debug the XML::SAX::Machines Implementation I have sitting there looking pretty but not working..

      I apologize for the dumb question, and for future dumb questions that I'll likely ask.

Re: XML::Parser question.
by Arrowhead (Monk) on Nov 04, 2002 at 19:07 UTC

    The error
    syntax error at line 1, column 0, byte 0 at /usr/lib/perl5/vendor_perl/5.6.1/i386-linux/XML/ line 185
    Looks like it is complaining about the first character of your XML file. My guess is that you have an empty line before the <?xml version='1.0'?> line, which should really be the very first thing in the file.

    That is, every XML file should begin with a < character.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://210247]
Approved by Mr. Muskrat
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2019-03-23 10:19 GMT
Find Nodes?
    Voting Booth?
    How do you Carpe diem?

    Results (114 votes). Check out past polls.