Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: regex to extract text

by jonnyfolk (Vicar)
on Jan 18, 2009 at 21:22 UTC ( #737180=note: print w/ replies, xml ) Need Help??


in reply to Re: regex to extract text
in thread regex to extract text

Thanks for the fix and the advice - I'll be straight onto the parser!!


Comment on Re^2: regex to extract text
Replies are listed 'Best First'.
Re^3: regex to extract text
by graff (Chancellor) on Jan 19, 2009 at 07:46 UTC
    Note that CountZero's solution (based on your initial attempt, just adding the necessary "s" modifier) is doing a greedy match with '(.*)' -- this means that if there are two or more instances of '</div>' following the address section, the match will extend to the farthest one.

    Using '(.*?)' instead, to specify a non-greedy match, will do what you really want, though as pointed out already, you probably should be getting acquainted with proper HTML parsing. It takes a bit of learning to catch on, but in the long run a parsing module will lead you to quicker and better solutions than what can be done with regex matching.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://737180]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (14)
As of 2015-07-31 12:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (277 votes), past polls