http://www.perlmonks.org?node_id=202093


in reply to Regex help

I see a few potential problems here.
  1. The regex you show won't work, because you didn't escape your / in </script>
  2. the dot (.) doesn't match _anything_, by default it doesn't match newlines. The /s modifier at the end of the regex changes that behavior to what you want. (see perlre)
So  s/<script>(.*?)<\/script>//sg; should do what you want. I can't speak to the validity of what you're trying to do, but that should make the perl work :)

Update: Paren typo corrected per fglock below. (Was (.*)?, which would be a greedy match, with the ? essentially pointless, acting on a * modified group) I left the parens in to show a capture, but fglock is completely correct that you don't need the parens.

Replies are listed 'Best First'.
Re: Re: Regex help
by fglock (Vicar) on Oct 01, 2002 at 20:34 UTC

    You mean  (.*?)

    Actually you don't need parenthesis:

    s/<script>.*?<\/script>//sg;
      Nope. The parenthesis are optional, but can be VERY useful. For example, say you want to remove the <script> and </script>, but be able to give some sort of warning about the script tags. For example, you may filter out:
      <script> malicious_code_to_do_something_nasty </script>
      If you use your regex as <script>(.*?)</script>, it saves the smallest amount (the ?) of anything (the .*) into a variable. That variable name depends on how many sets of parenthesis you've used. If it's the first (and only) time you use them, it gets saved into $1. If the second time, $2, and so forth. You can use it for something like this:
      $text = "my name is john q user\n"; $text =~ s/^my name is (.*?) .*$/$1/; # removes "my name is ", saves the next word, essentially, into $1, re +moves the rest print "hello, $text!\n"; # prints "hello, john!\n"
      This is VERY useful in extracting information from strings.


      -dingoStick.com