in reply to *fixed*Problem with <> and regex
Closer (though still not correct) is:
$words =~ s/<.*>//g;
which means "get rid of "<" and ">" and anything between. The reason it's still not correct is because it will delete multiple <...> ... <...> from the line, including the text within it. (try it and see). That is to say, it matches (and deletes) this entire line:
<span class="author-name" itemprop="author">Romaxton</span>
A real solution would be:
$words =~ s/<[^>]+>//g;
where the "[^>]+" part means "1 or more of any character except greater-than ">". That regex should therefore get rid of all occurrences of <...> in the line, without removing non-tag text in between.
Edit: it's worth pointing out another solution would be to use the "non-greedy" quantifier "?" in "still not correct" example I gave above:
$words =~ s/<.*?>//g;
which would have the effect of matching the shortest possible "<...>" each time, and thus avoid getting multiple pairs.
Edit 2: fixed misspelling of "$word" to "$words".
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Problem with <> and regex
by luxlunae (Novice) on Mar 11, 2014 at 15:20 UTC | |
by golux (Chaplain) on Mar 11, 2014 at 15:23 UTC |