Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Need help for Xpath patterns

by Anusha (Initiate)
on May 25, 2012 at 09:34 UTC ( [id://972388]=perlquestion: print w/replies, xml ) Need Help??

Anusha has asked for the wisdom of the Perl Monks concerning the following question:

I have a scenario like...

<a>Test case 1 <b>Test case 2 <c>Test case 3 </c> </b> </a>

I need to fetch the text content of <c> node where multiple space is found betweeen words. The pattern I am using is

<code>//*[*contains(.," ")]<code>

But the pattern matches 3 different datas

1) Test case 1Test case 2Test case 3 (text content of "a tag""b tag""c tag" is appended)

2) Test case 2Test case 3 (text content of "b tag""c tag" is appended)

3) Test case 3 (text content of "c tag")

But I need only the text content of <c> node...Please help

Replies are listed 'Best First'.
Re: Need help for Xpath patterns
by roboticus (Chancellor) on May 25, 2012 at 10:35 UTC

    Anusha:

    You say you want the contents of the <c> node, but your code is asking for nodes containing a space. When writing code for Xpath, perl or any other language, you need to write code that asks for exactly what you want, rather than rather than trying to hack together an unrelated request that may--by chance--give you the result you wanted.

    I haven't done Xpath in quite a while, so I don't recall the syntax, but google led me to a couple sites with Xpath examples. It appears that asking for a specific node type is more like: //nodetype, so in your case you'd use //c. If you want X but ask for Y, don't be surprised when you get useless results. If you don't know how to ask Xpath (or any other language) the right question, google is but a few keystrokes away.

    Update: After seeing AM's response, I realize your search criterion may be to find nodes containing multiple contiguous blanks. In that case, you could use //*[*contains(.,"  ")] (Note: two blanks), and if you're using XPath 2.0, you could use a regex: //*[*matches(.,'\s\s')]. (Note: I've not tested these, so they may need tweaking.

    Update 2: choroba has an even better answer--I didn't even consider the "between words" clause.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Need help for Xpath patterns
by choroba (Cardinal) on May 25, 2012 at 12:02 UTC
    You probably mean a pattern like this
    //*[contains(., ' ')]
    But the result is correct: There is enough whitespace before the <b> tag, for example. You said you wanted multiple spaces between words, so you have to add the "between words" part. I think it is not possible in XPath 1, but as XML::XSH2 was mentioned, you can use it like this:
    ls //*[xsh:matches(text(),'\b +\b')]
Re: Need help for Xpath patterns
by Anonymous Monk on May 25, 2012 at 10:30 UTC

    Why limited to xpath pattern?

    Using xsh2, which I hate :) the following works

    $ cat jonk.xml <a> <b> <c>not match this</c> </b> <b> <c>but match this</c> </b> </a> $ cat jonk.xsh open "jonk.xml"; #~ foreach /a/b/c { foreach //c { if xsh:matches( text(), '\b\s{2,}\b' ) { pwd; echo text(); } } $ xsh -q jonk.xsh /a/b[2]/c but match this

    See XSH2 Reference, Flow control

Re: Need help for Xpath patterns
by sundialsvc4 (Abbot) on May 25, 2012 at 15:50 UTC

    When writing XPath, specify what you want, not where it is.   If you are interested in the content of <c> nodes, i.e. regardless of where they might be found, then ask for that.   Don’t code the search “from the outside in,” based on your personal knowledge of the structure.

    You might find it simplest to let at least your first stab at the XPath expression locate all of the nodes, with your Perl code then looking for multiple spaces within the body.   Then, if necessary and warranted, you can refine the expression to look for the spaces.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://972388]
Approved by Eliya
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 13:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found