Hi there, i have the below html code from a page and i would like to extract the title and the price, but it is taking me so long.
This is the sample of the html file i am parsing:
<td width="135px"> <a href="google.com" title="Please help " class="p
+roduct-image"><img src="google.com/blabla.jpg" width="135" height="18
+0" alt="Please help " /></a></td>
<td valign="top">
<div class="category-description">
<h2 class="product-name"><a href="google.com" titl
+e="Please help ">Please help </a></h2>
<strong>Doodle Thomson </strong> <br/>
At a time when many people are attempting to relat
+e current events and trends in the world to interpretations of the pr
+ophecies contained in the Book of Revelations, and the writings of No
+stradamus, and the predictions of fashionable clairvoyants, the autho
+r does much the same </div>
</td>
<td width="30%">
<div class="categoty_price">
<div class="price-box">
<span clas
+s="regular-price" id="product-price-1139">
<span class="price">$20.00
+</span> </span>
I need the title (Please help) and the price ($20.00). It is a long html file with many more of these.
Please help.
This is the code that i have so far but failing me...
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
if (@ARGV != 1 ) { die "./quick.book.parsing.pl <html file>\n";}
#open html file
open(HTML,$ARGV[0]) || die "Couldn't open file $ARGV[0]\n";
while (my $html = <HTML>)
{
next if $html =~ /<button/;
chomp $html;
if ($html =~m/title="(.*)" class/g)
{
my @columns =~ split (/\"/, $html);
print "$columns[5]";
}
}
#{
# next if $html =~ /<button/;
# chomp $html;
# if ($html=~ m/title="(.*)" class/g)
#{
# $columns[5] =~ s/^\s+|\s+$//g;
# print "$columns[5]---";
#}
#
# if ($html=~ m/<span class="price">(.*)<\/span>/g)
#{
# $html =~ s/^\s+|\s+$//g;
# print "$html\n";
#}
#}
close HTML;
exit;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.