Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: my code error-ing out trying to grab html

by RayRay459 (Pilgrim)
on Mar 12, 2002 at 08:59 UTC ( [id://151069]=note: print w/replies, xml ) Need Help??


in reply to my code error-ing out trying to grab html

per the advice given to me, i added $ua = LWP::UserAgent->new; after my variable declaration. The fixed the previous error (thanks so much)However, now my problem lies within my regex. Here's the updated code:
#!/usr/bin/perl use HTTP::Request; use HTTP::Headers; use LWP::UserAgent; my(@ListingsUrls, $req, $ua, $responsecode, $res, $row, $url); $ua = LWP::UserAgent->new; $url = "http://listings.test.com/aw/listings/list/category12"; $request = new HTTP::Request('GET',$url); $ua->timeout(10); $response = $ua->request($request); my $responsecode = $response->code(); next if $responsecode != 200; @ARRAY_OF_LINES = (split "\n", $ua->request($request)->as_stri +ng); $request->as_string; foreach $row (@ARRAY_OF_LINES) { chomp($row); print $row . "\n"; } if ($row =~ (/Updated\s*:\s*\w+\s*-\s*\d{1,2}:\d{1,2}\d{1,2}\s +* PST/)){ print $1; }else{ print "html didn't contain Updated"; }
I added a print statement to $row and i can see what i am looking for(this is an excerpt from my console when the line print $row is executed):
<tr> <td> <p></p><br> <center> <font face="Arial, Helvetica" size="-1"> <b>Updated: Mar-11 23:05:50 PST</b>
Any advice would be greatly appreciated.
thnx, Ray

Replies are listed 'Best First'.
Re: Re: my code error-ing out trying to grab html
by davis (Vicar) on Mar 12, 2002 at 09:33 UTC
    Hi,
    You were correct - the regex isn't matching. Here's a fixed-up and /x modified version:
    #!/usr/bin/perl -w use strict; my $row = "Updated: Mar-11 23:05:50 PST"; if($row =~ /Updated\s*: #The String "Updated", followed b +y zero or more spaces, then a colon \s*\w+\s*-\s*\d{1,2}\s* #This matches " Mar-11 " \d{1,2}:\d{1,2}:\d{1,2} #The time \s*PST/x) { #The timezone - do you really nee +d to be this specific? print "Match\n"; } else { print "No Match\n"; }
    On the other hand: should you really be using a regex for this at all?
    If it's a date, then Date::Calc could do very well for you (check out parse_date).
    I'd also recommend thinking about whether you really need to have " PST" in your regex at all - do all the "Updated" strings contain that timezone information?
    hope this helps
    davis
    Is this going out live?
    No, Homer, very few cartoons are broadcast live - it's a terrible strain on the animator's wrist
    Update: On re-reading your code, it appears that you're using parentheses to try and capture the whole string - these need to go inside the regex delimeters.
    And I won't even mention that you should be "use"ing "strict" and "warnings" ;-)

      Break out the bulky, slow Date::Manip.

      ... use Date::Manip; if (/>Updated: (.*?)</ && $date = ParseDate($1)) { &do_something_with($date); } else { warn "no parsum date!\n"; &fail_gracefully; }

      Date::Manip does have the advantage of parsing almost anything that can be a date.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://151069]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-28 14:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found