Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Extract hidden values from HTML

by kalyanrajsista (Scribe)
on Dec 14, 2009 at 07:37 UTC ( [id://812662]=perlquestion: print w/replies, xml ) Need Help??

kalyanrajsista has asked for the wisdom of the Perl Monks concerning the following question:

I'm using HTML::TableExtract module to extract HTML table rows. How to extract hidden values in the table rows. My problem is that my HTML rows are not unique at all times and I want to prepare a Hash with each key referring to an array. I've an hidden value inserted into the HTML rows, which is unique and I want to make that as a key. I'm bit concerned of how to implement so that I don't want to miss any rows as part of hash building process.

Replies are listed 'Best First'.
Re: Extract hidden values from HTML
by Utilitarian (Vicar) on Dec 14, 2009 at 09:53 UTC
    It would be more appropriate to use the id attribute to identify individual elements within html.
    <table> <tr id="row_1"> <td id="row_1_col_1"> stuff </td> <td id=row_1_col_2"> stuff </td> </tr> <tr id="row_2> ... </table>
    HTML::Parser would provide an attributes string which would include the attribute id="row_1_col2" which could in turn be used to build your own hash. As HTML::TableExtract is built on HTML::Parser , the attributes may be available.

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

      You might have meant this anyway, being that you're giving a very general example, but just in case Kalyanrajsista doesn't - it'd be better to use something semantic for the ids or classes rather than literally "row_1_col_1".

      I haven't used the modules mentioned, but I'd imagine it's likely you could just count which row you're on anyway, so you could know which row you're on without that information being in the HTML anyway. What you can't know is what it means. Use class names and ids (remember ids have to be unique to the page) like "last_name" or "total_price" or whatever else is appropriate.

        I've a row defined as below and I want to extract hidden value from it.

        <td class="c"><input type="hidden" name="recid5" value="10293424">2009 +/08</td>
Re: Extract hidden values from HTML
by moritz (Cardinal) on Dec 14, 2009 at 08:08 UTC
    What do you mean by hidden values?

    A value either appears in the HTML, or not. Maybe the browser hides it with CSS, but a HTML parser is fairly unimpressed by such styling information.

    So I don't see how values in HTML can be "hidden".

    If you have a specific problem, please post your code, example input and a description of what is not working.

      Maybe kalyanrajsista means that the "hidden value" is inserted as a HTML comment and thus not visible to the user. But it's hard to tell without seeing some of the relevant HTML.

      <html><head> <title>Data</title> </head> <form name="action"> <table cellspacing="1" cellpadding="1"> <tr> <td>File name</td> <td>From To</td> <td>Svc</td> <td>Period</td> <td>Seq#</td> <td>Reason for Rejection</td> <td>Next Action</td> </tr> <tr> <td>Sample.XML</td> <td><a href='dch_redirect?refer=10293377&ref=10293377'>USA-IND</a> +</td> <td>Voice</td> <td><input type="hidden" name="recid1" value="10293377">2009/08</t +d> <td>03386</td> <td>data already exists</td> <td><input type="checkbox" name="c_1"></td> </tr> </body></html>

      Above is the sample HTML file and I want to extract the hidden value's with the names recid1. Or is there any way to get 'href' values from the HTML content.

        One way
        #!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $t = HTML::TreeBuilder->new_from_file(*DATA); my @hidden_inputs = $t->look_down( _tag => q{input}, type => q{hidden}, name => q{recid1}, ); for my $hidden_input (@hidden_inputs){ printf qq{*%s*\n}, $hidden_input->attr(q{value}); } __DATA__ <html><head> <title>Data</title> </head> <form name="action"> <table cellspacing="1" cellpadding="1"> <tr> <td>File name</td> <td>From To</td> <td>Svc</td> <td>Period</td> <td>Seq#</td> <td>Reason for Rejection</td> <td>Next Action</td> </tr> <tr> <td>Sample.XML</td> <td><a href='dch_redirect?refer=10293377&ref=10293377'>USA-IND</a></ +td> <td>Voice</td> <td><input type="hidden" name="recid1" value="10293377">2009/08</td> <td>03386</td> <td>data already exists</td> <td><input type="checkbox" name="c_1"></td> </tr> </body></html>
        *10293377*
        update: Similar would also work for hrefs
Re: Extract hidden values from HTML
by ww (Archbishop) on Dec 15, 2009 at 06:31 UTC

    With the clarification in "Re 3", this should give you an approach:

    #!/usr/bin/perl use strict; use warnings; # 812662 my @data = ('<td class="c"><input type="hidden" name="recid5" value="1 +0293424">2009/08</td>', '<td class="c">foobar - Item &quot;123&quot;: <b>8</b></td +>' ); for my $data(@data) { if ($data =~ /"hidden"/ && $data =~ /value="([^"]+)/ ) { print "Hidden value: $1 \n"; }else{ print "Nothing hidden in \$data: $data\n"; } }

    Output:

    Hidden value: 10293424 Nothing hidden in $data: <td class="c">foobar - Item &quot;123&quot;: +<b>8</b></td>

    Update: else {...} added for clarity and output updated

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://812662]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-20 01:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found