Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: XML::LibXML::Reader giving wrong matched element

by ww (Archbishop)
on Nov 24, 2011 at 14:59 UTC ( [id://939907]=note: print w/replies, xml ) Need Help??


in reply to XML::LibXML::Reader giving wrong matched element

You're going to have to help us (well, /me, anyway) to understand what your mean by "@matchedNodes give me two elements" -- and that's another way for me to say, "please tell us your output (and error messages, if any)."

And then too, though I don't know for sure, I think your XML is NOT valid. Don't you need </state> and </city> tags for each state and city entry?

Rephrased: for clarity, the paraphrase of the quote in the first para.

Replies are listed 'Best First'.
Re^2: XML::LibXML::Reader giving wrong matched element
by cavac (Parson) on Nov 24, 2011 at 23:41 UTC

    Don't you need </state> and </city> tags for each state and city entry?

    Actually, no you don't. These are self-closing tags (can't remember the technical term). You use them for tags hold no other tags or values. I'm not sure how to translate that to english. Let's try an example.

    Let's use some HTML tags for this XML example for simplicity. The classic open/close tag would be a link:

    <a href="/hello">Inner value</a>
    And then there is the classic image tag.
    <img src="monk.gif"/>
    The slash at the end recloses the opening tag.

    I'm not an expert in this, so i just hope i didn't mess this up. Because if i did, i have to rewrite like 50 XML files tomorrow....

    Don't use '#ff0000':
    use Acme::AutoColor; my $redcolor = RED();
    All colors subject to change without notice.
      What is it that distinguishes the <state...> and <city...> tags from the <country...> tags? Is it strictly that the OP's code provides the shortcut close, "/>" for state and city but not for country? If <country...> had a shortcut close would it not need a </county> tag? And if so, why not use a shortcut close globally -- that is, on <world> and <country>. I still don't "get it" in that regard.

      It seems to me that consistency would make parsing easier... and might even help explain why the OP (you?) is seeing unexpected numbers of elements.

      I'm quite curious, because a simpleminded search on "XML close tag" produced a selection of inconsistent assertions.

      On the first paw, your explanation doesn't seem consistent with beginner tuts like that at http://www.w3schools.com/xml/xml_syntax.asp nor with http://www.w3schools.com/xml/xml_dtd.asp nor http://www.xmlfiles.com/xml/xml_syntax.asp -- none of which are authoratative (but I'm too full of turkey to chase it down -- and while you may suspect a turkey byproduct, that's another discussion). All of those agree that the only or chief exception to a "must have a closing tag" rule is the <empty-element />

      But on the hind paw, the XML validator at http://www.w3schools.com/xml/xml_validator.asp passes, as "well formed," the OP's code, when that is modified with a leading <?xml version...> header, and has the elipsis replaced with arbitary sample data.

      Thus, while I'm still uncertain "why" and "how" your take on the matter can be true, I won't dispute it (at least for the moment).

      I will, however, quibble with your assertions about html. They're good examples of the point you're making... but they're NOT entirely correct. The standards for 4.01 transitional and 4.01 strict differ on what's required, where. Your link example is correct ("valid") in both; the shortcut close on image is NOT required by 4.01 transitional (aka "loose"). And html5 is a fish with different feathers.

      In any case,, if you posted the OP as an AM and are now expanding on that post, please provide the sample output requested above... and, whether you are the OP or not, thank you for taking the time and effort to reply.

        What is it that distinguishes the <state...> and <city...> tags from the <country...> tags? Is it strictly that the OP's code provides the shortcut close, "/>" for state and city but not for country?

        Yes.

        If <country...> had a shortcut close would it not need a </county> tag?

        <foo x="y"/> and <foo x="y"></foo> are completely equivalent, so not only would it not need a </country> tag, it could not have a </country> tag. One can't close an element more than once.

        your explanation doesn't seem consistent with beginner tuts

        I presume you are referring to "all XML elements must have a closing tag".

        That claim is true, but <foo/> serves as both the opening and closing tag of the element, so it satisfies the requirement of the presence of a closing tag.

        why not use a shortcut close globally -- that is, on <world> and <country>

        That would be impossible because the world and country elements have non-attribute children.

        In fact, I'd say the city elements are misplaced in the OP's XML. The indenting indicates the OP wants them to be children of states, but he made them children of countries.

        <country short="usa" name="united state of america"> <state short="CA" name="california"/> <city short="SFO" name="San Franscisco"/> <city short="EM" name="Emeryville"/> <state short="FL" name="florida"/> ... More intermixed states and cities ... </country>

        means

        <country short="usa" name="united state of america"> <state short="CA" name="california"></state> <city short="SFO" name="San Franscisco"></city> <city short="EM" name="Emeryville"></city> <state short="FL" name="florida"/></state> ... More intermixed states and cities ... </country>

        but he surely wants

        <country short="usa" name="united state of america"> <state short="CA" name="california"> <city short="SFO" name="San Franscisco"/> <city short="EM" name="Emeryville"/> </state> <state short="FL" name="florida"> ... More cities ... </state> ... More states ... </country>

        the shortcut close on image is NOT required by 4.01 transitional (aka "loose").

        That's not right.

        SGMLHTML5
        HTML
        Serialisation
        XML
        HTML4OtherXHTML1
        strict
        XHTML1
        transitional
        HTML5Any other
        XML schema
        stricttransitional
        <br>Well-formed and Valid[Varies]ValidMalformed
        <p>Well-formed and ValidValid
        <div>Well-formed but InvalidInvalid
        <br/>MalformedTolerated*Well-formed and Valid
        <p/>Invalid
        <div/>Invalid
        <br></br>Well-formed but InvalidInvalidWell-formed and Valid
        <p></p>Well-formed and ValidValid
        <div></div>Well-formed and ValidValid

        Note that browsers are very forgiving and accept all kinds of malformed and invalid HTML.

        As an aside, the table clearly highlights XHTML's advantage over HTML: simplicity. The cost, of course, is that XHTML is more wordy. (Like Java vs Perl?)

        * — The HTML serialisation of HTML5 accepts "/" on elements that cannot have a closing tag (area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr). (ref)

        The standards for 4.01 transitional and 4.01 strict differ on what's required

        They differ on what constitutes a valid HTML or XHTML document (i.e. what elements and attributes are allowed), but they do not differ on what constitutes a well-formed HTML or XML documents (i.e. on what is valid syntax).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://939907]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-20 02:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found