Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Problem getting fields out of an XPath node list

by CountZero (Bishop)
on Mar 29, 2016 at 07:23 UTC ( [id://1158993]=note: print w/replies, xml ) Need Help??


in reply to Problem getting fields out of an XPath node list

Your program never reaches the my @node_list = $node->findvalues('td/tr') due to the next on the preceding line, so @node_list never gets populated.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Replies are listed 'Best First'.
Re^2: Problem getting fields out of an XPath node list
by ejc1 (Novice) on Mar 29, 2016 at 14:30 UTC
    I managed to get it out badly with
    my @nodes = $tree->findnodes('//tr'); for my $node (@nodes) { @text = $node->findvalues('td') or next; print Dumper \@text;
    It is bad, in that I still have no clue what xpath is doing, despite reading documentation on it. It only works because as far as i can see, there is only one table... I am trying to get the following data parsed:
    <ul><li>There was registered attempt to establish connection with the +remote host. The connection details are:</li></ul> <p><table class="tbl" cellpadding="5" cellspacing="0"> <tr><td class="cell_1_h">Remote Host</td><td class="cell_2_h">Port Num +ber</td></tr> <tr><td class="cell_1">192.5.5.241<td class="cell_2">8091</td><tr> </table></p>
    but am having zero luck. I am trying:
    my @nodes = $tree->findnodes('//ul'); for my $node ?(@nodes) { my $text2 = $node->findvalue('li') or next; if ($text2 =~ m/connection details are/) { print "$text2\n"; my @text = $node->findvalues('/tr/td'); print @Dumper \@text; } }
    The problem is, it clearly finds the li node, matches it, and then tries to run the findvalues against /tr/td. This totally doesn't work.... I have tried '/tr/td', '//tr/td', 'td', and cant get any of them to work at all. The total format of the section, as pasted from above is:
    <ul><li>....connection...</li></ul> <p><table> <tr><td>stuff</td><td>stuff2</td></tr> . . . </table> </p>
    What the heck is the xpath of the items below that section? Is it even possible to match this? I totally dont understand xpath at all....

      Why do you keep using ->findvalues ? Simply retrieve the nodes or find the text within the nodes explicitly:

      /tr/td/text()

      Personally, I simply find the nodes and then use their ->as_text() method to get at their textual content.

        I've tried to look at the raw nodes with dumper, and I cant make any sense of it. The document is very complex (see http://www.threatexpert.com/report.aspx?md5=2aafcad88572d98c154ab7d80cbafc02) and as I mentioned, I have zero understanding of xpath. I looked at as_text, but the problem is, I just don't understand xpath format at all, to even attempt to scope my node elements to just that one section I mentioned. If I understood how the nodes were built, I think I could be ok, but to be honest, I just totally don't get this at all. When I do '//tr/td', I get _all_ of the td elements in one giant array, instead of just narrowing the damn thing to the one section I tried to match against in my post. :(
      Finding the value of the list element is not really helping you as the table is not an element of the list. If you know there is only one table, this verbose example may help:
      # get all the tables my @tables = $tree->findnodes('//table'); # get the first table my $table = $tables[0]; # get all the rows of first table my @rows = $table->findnodes('tr'); # loop through the rows for my $row ( @rows ) { # get all the cells my @cells = $row->findnodes('td'); # loop through the cells for my $cell ( @cells ) { print $cell->as_text, "\n"; } } Output: Remote Host Port Number 192.5.5.241 8091
        So, I don't understand why $cell->as_text gives the data, when Dumper \@cells prints a giant ton of garbage. Also, even though I have specified the table element as
        my @tables = $tree->findnodes('//table'); my $table = $tables[12];
        I cant reference this directly. Printing @cells[2]->as_text fails outright with "can't call method 'as_text' on an undefined value". It is clearly in there as
        my @cells = $row->findnodes('td')
        .... Anything I do to @cells flat out fails except for looking through with the mentioned
        for my $cell (@cells)... print $cell->as_text
        At this junction, I am about to totally give up on this, since I do not understand this at all and have no other way I can parse this otherwise. Since as_text dumps this one entry at a time, I was hoping to process the even elements of @cells as host/ip address and the odd as the previous elements port. But I just don't get this at all.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1158993]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-19 03:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found