lenrobert has asked for the wisdom of the Perl Monks concerning the following question:
I am aware of OR ( | ), but is there logical NOT in the PERL regex syntax?
The task would be the following: to extract the relative links (i.e. the href property of the "a" element) from an HTML file, even if it is not enclosed in quotation marks. This means I don't want to retrieve hyperlinks beginning with /, or # or javascript:
I would express the following string, and capture (extract) the content of the second parenthesis.
( <a href=" OR <a href=) THEN NOT(/ OR # OR javascript: OR \s OR " ) THEN ( \s OR " )
The best regexp I could do is this, but it does not handle the case of / # javascript: etc.
/(<a href="|<a href=)([^"]*?)(\s|")/gi)
Does anyone know the answer, and help me? Thanks in advance,
Robert
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Boolean operators in PERL regexp?
by friedo (Prior) on Feb 24, 2005 at 18:45 UTC | |
Re: Boolean operators in PERL regexp?
by Enlil (Parson) on Feb 24, 2005 at 18:55 UTC | |
Re: Boolean operators in PERL regexp?
by ikegami (Patriarch) on Feb 24, 2005 at 19:02 UTC | |
by lenrobert (Initiate) on Feb 25, 2005 at 16:25 UTC | |
by ikegami (Patriarch) on Feb 25, 2005 at 16:30 UTC | |
by lenrobert (Initiate) on Feb 25, 2005 at 17:41 UTC |