http://www.perlmonks.org?node_id=434178

lenrobert has asked for the wisdom of the Perl Monks concerning the following question:

I am aware of OR ( | ), but is there logical NOT in the PERL regex syntax?

The task would be the following: to extract the relative links (i.e. the href property of the "a" element) from an HTML file, even if it is not enclosed in quotation marks. This means I don't want to retrieve hyperlinks beginning with /, or # or javascript:

I would express the following string, and capture (extract) the content of the second parenthesis.

(  <a href="   OR <a href=) THEN NOT(/  OR   #   OR   javascript:  OR  \s   OR  "  ) THEN ( \s   OR  "  )

The best regexp I could do is this, but it does not handle the case of / # javascript: etc.


/(<a href="|<a href=)([^"]*?)(\s|")/gi)

Does anyone know the answer, and help me? Thanks in advance,

Robert