http://www.perlmonks.org?node_id=1003781

perlnoobster has asked for the wisdom of the Perl Monks concerning the following question:

Hi perl monks,

I'm currently trying to extract some information from a website, this is the html segment that I am working on:

<td class="ttl"><a href=# onClick="helpW('h_status.htm');">Status</a>< +/td> <td class="nfo">Coming soon. Exp. release 2012, November 13th</td> </tr>

the code that I am using is the following to extract the Status onwards:

 my (@ASINS2)=$final_page=~m!Status</a></td>(.+?)/td>!g;

The results yield nothing, however if I were to change the code to the following:

 my (@ASINS2)=$final_page=~m!<td class="nfo">(.+?)</td>!g;

It seems to work, highlighting that the code/regex has an issue trying to grab any html that is separated on new lines, please can someone help? i'm sure its a regex issue but I cant figure it out?!

 

  (all i require is the Coming soon..... segment)

Thank you

Replies are listed 'Best First'.
Re: Perl Get issue
by choroba (Cardinal) on Nov 14, 2012 at 10:59 UTC
    Withoug qr//s, a dot does not match a newline.

    Moreover, do not parse HTML with regular expressions. See Super Search.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hello, Sorry I do not understand, what do you mean qr//s? do I replace (.+?) with that? Thank you :)
        The qr part might be throwing you off. You don't need to change anything to qr; just add the s switch to the end of your regex:
        m!...!sg;
        (qr// is just the regex quote operator; in this case it was used as a generic example of how to make a . match newlines).


        When's the last time you used duct tape on a duct? --Larry Wall
Re: Perl Get issue
by space_monk (Chaplain) on Nov 14, 2012 at 14:05 UTC
    Consider using Web::Scraper; you should be able to pull out all the elements with class nfo from your page and live happily ever after.

    You might be able to do something similar with other libraries if that one doesn't work out.

    A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: Perl Get issue
by Anonymous Monk on Nov 14, 2012 at 18:21 UTC