Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Help with regular expression - real file

by hanenkamp (Pilgrim)
on Dec 13, 2003 at 17:08 UTC ( [id://314538]=note: print w/replies, xml ) Need Help??


in reply to Help with regular expression - real file

I think merlyn is right, trying to scan HTML is difficult. On the other hand, for something as simple as what you are attempting, XML::LibXML may be overkill. In this, assuming that the page doesn't change formatting frequently you are really looking for a pattern like:

/(?<=>)([\w ]+?) PRIMARY SCHOOL/

This will match the non-greedily any amount of words and space following the last ">" of a tag that is followed by the words " PRIMARY SCHOOL". This will include " PRIMARY SCHOOL" in the match too. This will fail if the line is broken in the middle--but you can get around that by using "\s" instead of spaces between words and such.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://314538]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-19 09:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found