Perl Get issue

perlnoobster has asked for the wisdom of the Perl Monks concerning the following question:

Hi perl monks,

I'm currently trying to extract some information from a website, this is the html segment that I am working on:

<td class="ttl"><a href=# onClick="helpW('h_status.htm');">Status</a><
+/td>
<td class="nfo">Coming soon. Exp. release 2012, November 13th</td>
</tr>
[download]

the code that I am using is the following to extract the Status onwards:

my (@ASINS2)=$final_page=~m!Status</a></td>(.+?)/td>!g;

The results yield nothing, however if I were to change the code to the following:

my (@ASINS2)=$final_page=~m!<td class="nfo">(.+?)</td>!g;

It seems to work, highlighting that the code/regex has an issue trying to grab any html that is separated on new lines, please can someone help? i'm sure its a regex issue but I cant figure it out?!

(all i require is the Coming soon..... segment)

Thank you

Comment on Perl Get issue Select or Download Code

Replies are listed 'Best First'.
Re: Perl Get issue by choroba (Cardinal) on Nov 14, 2012 at 10:59 UTC
Withoug `qr//s`, a dot does not match a newline. Moreover, do not parse HTML with regular expressions. See Super Search. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: Perl Get issue by perlnoobster (Sexton) on Nov 14, 2012 at 11:05 UTC
Hello, Sorry I do not understand, what do you mean qr//s? do I replace (.+?) with that? Thank you :)	[reply]
Re^3: Perl Get issue by ColonelPanic (Friar) on Nov 14, 2012 at 11:48 UTC
The qr part might be throwing you off. You don't need to change anything to qr; just add the s switch to the end of your regex: `m!...!sg;` [download] (qr// is just the regex quote operator; in this case it was used as a generic example of how to make a . match newlines). When's the last time you used duct tape on a duct? --Larry Wall	[reply] [d/l]
Re^4: Perl Get issue by perlnoobster (Sexton) on Nov 14, 2012 at 12:50 UTC
Re^5: Perl Get issue by Don Coyote (Hermit) on Nov 14, 2012 at 13:42 UTC
Some notes below your chosen depth have not been shown here
Re^3: Perl Get issue by choroba (Cardinal) on Nov 14, 2012 at 11:09 UTC
See the `s` switch to `qr//` in Perl regular expressions. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re: Perl Get issue by space_monk (Chaplain) on Nov 14, 2012 at 14:05 UTC
Consider using Web::Scraper; you should be able to pull out all the elements with class nfo from your page and live happily ever after. You might be able to do something similar with other libraries if that one doesn't work out. A Monk aims to give answers to those who have none, and to learn from those who know more.	[reply]
Re: Perl Get issue by Anonymous Monk on Nov 14, 2012 at 18:21 UTC
htmltreexpather.pl , Parsing HTML / Re^4: Parsing HTML, A regex question , NASA's Astronomy Picture of the Day / Re: NASA's Astronomy Picture of the Day , Re: Extracting HTML content between the h tags, ....	[reply]

Back to Seekers of Perl Wisdom