Extract hidden values from HTML

kalyanrajsista has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extract hidden values from HTML by Utilitarian (Vicar) on Dec 14, 2009 at 09:53 UTC
It would be more appropriate to use the id attribute to identify individual elements within html. `<table> <tr id="row_1"> <td id="row_1_col_1"> stuff </td> <td id=row_1_col_2"> stuff </td> </tr> <tr id="row_2> ... </table>` [download] HTML::Parser would provide an attributes string which would include the attribute `id="row_1_col2"` which could in turn be used to build your own hash. As HTML::TableExtract is built on HTML::Parser , the attributes may be available. `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l] [select]
Re^2: Extract hidden values from HTML by Unforgiven (Hermit) on Dec 14, 2009 at 14:37 UTC
You might have meant this anyway, being that you're giving a very general example, but just in case Kalyanrajsista doesn't - it'd be better to use something semantic for the ids or classes rather than literally "row_1_col_1". I haven't used the modules mentioned, but I'd imagine it's likely you could just count which row you're on anyway, so you could know which row you're on without that information being in the HTML anyway. What you can't know is what it means. Use class names and ids (remember ids have to be unique to the page) like "last_name" or "total_price" or whatever else is appropriate.	[reply]
Re^3: Extract hidden values from HTML by Anonymous Monk on Dec 15, 2009 at 06:07 UTC
I've a row defined as below and I want to extract hidden value from it. `<td class="c"><input type="hidden" name="recid5" value="10293424">2009 +/08</td>` [download]	[reply] [d/l]
Re: Extract hidden values from HTML by moritz (Cardinal) on Dec 14, 2009 at 08:08 UTC
What do you mean by hidden values? A value either appears in the HTML, or not. Maybe the browser hides it with CSS, but a HTML parser is fairly unimpressed by such styling information. So I don't see how values in HTML can be "hidden". If you have a specific problem, please post your code, example input and a description of what is not working.	[reply]
Re^2: Extract hidden values from HTML by Corion (Patriarch) on Dec 14, 2009 at 08:12 UTC
Maybe kalyanrajsista means that the "hidden value" is inserted as a HTML comment and thus not visible to the user. But it's hard to tell without seeing some of the relevant HTML.	[reply]
Re^2: Extract hidden values from HTML by kalyanrajsista (Scribe) on Dec 15, 2009 at 11:12 UTC
<html><head> <title>Data</title> </head> <form name="action"> <table cellspacing="1" cellpadding="1"> <tr> <td>File name</td> <td>From To</td> <td>Svc</td> <td>Period</td> <td>Seq#</td> <td>Reason for Rejection</td> <td>Next Action</td> </tr> <tr> <td>Sample.XML</td> <td><a href='dch_redirect?refer=10293377&ref=10293377'>USA-IND</a> +</td> <td>Voice</td> <td><input type="hidden" name="recid1" value="10293377">2009/08</t +d> <td>03386</td> <td>data already exists</td> <td><input type="checkbox" name="c_1"></td> </tr> </body></html> [download] Above is the sample HTML file and I want to extract the hidden value's with the names recid1. Or is there any way to get 'href' values from the HTML content.	[reply] [d/l]
Re^3: Extract hidden values from HTML by wfsp (Abbot) on Dec 15, 2009 at 11:40 UTC
One way #!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $t = HTML::TreeBuilder->new_from_file(DATA); my @hidden_inputs = $t->look_down( _tag => q{input}, type => q{hidden}, name => q{recid1}, ); for my $hidden_input (@hidden_inputs){ printf qq{%s\n}, $hidden_input->attr(q{value}); } __DATA__ <html><head> <title>Data</title> </head> <form name="action"> <table cellspacing="1" cellpadding="1"> <tr> <td>File name</td> <td>From To</td> <td>Svc</td> <td>Period</td> <td>Seq#</td> <td>Reason for Rejection</td> <td>Next Action</td> </tr> <tr> <td>Sample.XML</td> <td><a href='dch_redirect?refer=10293377&ref=10293377'>USA-IND</a></ +td> <td>Voice</td> <td><input type="hidden" name="recid1" value="10293377">2009/08</td> <td>03386</td> <td>data already exists</td> <td><input type="checkbox" name="c_1"></td> </tr> </body></html> [download] `10293377*` [download] update: Similar would also work for hrefs	[reply] [d/l] [select]
Re: Extract hidden values from HTML by ww (Archbishop) on Dec 15, 2009 at 06:31 UTC
With the clarification in "Re 3", this should give you an approach: `#!/usr/bin/perl use strict; use warnings; # 812662 my @data = ('<td class="c"><input type="hidden" name="recid5" value="1 +0293424">2009/08</td>', '<td class="c">foobar - Item "123": <b>8</b></td +>' ); for my $data(@data) { if ($data =~ /"hidden"/ && $data =~ /value="([^"]+)/ ) { print "Hidden value: $1 \n"; }else{ print "Nothing hidden in \$data: $data\n"; } }` [download] Output: `Hidden value: 10293424 Nothing hidden in $data: <td class="c">foobar - Item "123": +<b>8</b></td>` [download] Update: `else {...}` added for clarity and output updated	[reply] [d/l] [select]


Perl Monk, Perl Meditation
	PerlMonks