Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Help with regular expression - real file

by hanenkamp (Pilgrim)
on Dec 13, 2003 at 17:08 UTC ( [id://314538]=note: print w/replies, xml ) Need Help??

in reply to Help with regular expression - real file

I think merlyn is right, trying to scan HTML is difficult. On the other hand, for something as simple as what you are attempting, XML::LibXML may be overkill. In this, assuming that the page doesn't change formatting frequently you are really looking for a pattern like:

/(?<=>)([\w ]+?) PRIMARY SCHOOL/

This will match the non-greedily any amount of words and space following the last ">" of a tag that is followed by the words " PRIMARY SCHOOL". This will include " PRIMARY SCHOOL" in the match too. This will fail if the line is broken in the middle--but you can get around that by using "\s" instead of spaces between words and such.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://314538]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2025-01-22 12:28 GMT
Find Nodes?
    Voting Booth?
    Which URL do you most often use to access this site?

    Results (63 votes). Check out past polls.