Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Wrong regex?

by imrags (Monk)
on Feb 10, 2010 at 06:18 UTC ( #822352=perlquestion: print w/ replies, xml ) Need Help??
imrags has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I've an html page, i want to match this pattern:
</HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.10 : Minor </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded">
I want to get the IP and the status (10.10.10.10 & Minor)..
I wrote the following code
if ($html =~ /Set Node to Monitored \<BR\>\s+(\w+)\s\:\s(\w+)\W/i) { print "$1 and $2 found" }
The IP and status (minor)keeps changing...
The code I wrote doesn't seem to work. Any help!!! Raghu

Comment on Wrong regex?
Select or Download Code
Re: Wrong regex?
by ikegami (Pope) on Feb 10, 2010 at 06:31 UTC
    \w doesn't match punctuation other than underscores. Specifically, \w+ doesn't match 10.10.10.10

      \w does match the _ (underscore) punctuation character.    But it doesn't match any of the others.

        I don't think of it that way, but yeah, I suppose it is a punctuation mark. Fixed.
Re: Wrong regex?
by biohisham (Priest) on Feb 10, 2010 at 08:32 UTC
    • Escaping "<" or ">" isn't necessary.
    • Match digits and non digits of one or more occurrence as \d+ and \D+.
    • You can also match digits of one or more occurrence as [0-9]+
    • Read perlretut.
    #!/usr/local/bin/perl use strict; use warnings; print "IP\t\tStatus\n"; print "-" x 25; print "\n"; while(<DATA>){ chomp; next unless $_ =~ /^<\/HEAD>.*/i; #skip un-interestin +g lines. #my ($ip, $status) = $_=~ m/(\d+\.\d+\.\d+\.\d+)\s+:\s+( +\w+)/; my ($ip, $status) = $_=~ m/([0-9\.]+)\s+:\s+(\w+)/; print "$ip\t $status\n"; } __DATA__ </HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.10 : Minor </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded"> </HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.9 : Major </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded"> </HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.1 : Major </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded">
    #OUTPUT:
    IP Status ------------------------- 10.10.10.10 Minor 10.10.10.9 Major 10.10.10.1 Major



    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
      Why would you assume that an input line would be "un-interesting" if it doesn't start with </HEAD>.* ? And why did you think you need ".*" in that regex?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://822352]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2014-07-26 03:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls