Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Perl Get issue

by perlnoobster (Sexton)
on Nov 14, 2012 at 10:46 UTC ( #1003781=perlquestion: print w/ replies, xml ) Need Help??
perlnoobster has asked for the wisdom of the Perl Monks concerning the following question:

Hi perl monks,

I'm currently trying to extract some information from a website, this is the html segment that I am working on:

<td class="ttl"><a href=# onClick="helpW('h_status.htm');">Status</a>< +/td> <td class="nfo">Coming soon. Exp. release 2012, November 13th</td> </tr>

the code that I am using is the following to extract the Status onwards:

 my (@ASINS2)=$final_page=~m!Status</a></td>(.+?)/td>!g;

The results yield nothing, however if I were to change the code to the following:

 my (@ASINS2)=$final_page=~m!<td class="nfo">(.+?)</td>!g;  

It seems to work, highlighting that the code/regex has an issue trying to grab any html that is separated on new lines, please can someone help? i'm sure its a regex issue but I cant figure it out?!

(all i require is the Coming soon..... segment)

Thank you

Comment on Perl Get issue
Select or Download Code
Re: Perl Get issue
by choroba (Abbot) on Nov 14, 2012 at 10:59 UTC
    Withoug qr//s, a dot does not match a newline.

    Moreover, do not parse HTML with regular expressions. See Super Search.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hello, Sorry I do not understand, what do you mean qr//s? do I replace (.+?) with that? Thank you :)
        The qr part might be throwing you off. You don't need to change anything to qr; just add the s switch to the end of your regex:
        m!...!sg;
        (qr// is just the regex quote operator; in this case it was used as a generic example of how to make a . match newlines).


        When's the last time you used duct tape on a duct? --Larry Wall
Re: Perl Get issue
by space_monk (Chaplain) on Nov 14, 2012 at 14:05 UTC
    Consider using Web::Scraper; you should be able to pull out all the elements with class nfo from your page and live happily ever after.

    You might be able to do something similar with other libraries if that one doesn't work out.

    A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: Perl Get issue
by Anonymous Monk on Nov 14, 2012 at 18:21 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1003781]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (11)
As of 2014-07-28 09:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (195 votes), past polls