http://www.perlmonks.org?node_id=511884


in reply to regex to identify http:// in html

For a quick hack, s{(http://\S+?)(\s+)}{<a href="$1">$1</a>$2} will do what you ask. The important thing to note is the \S+?, which makes the regex non-greedy, i.e. it'll match the minimum amount required for the regex to succeed, rather than the maximum amount, which is what \S+ or .* would do. I've also used \S (any non-space character), as it's best to avoid . where you can: see death to dot star.

Replies are listed 'Best First'.
Re^2: regex to identify http:// in html
by sauoq (Abbot) on Nov 26, 2005 at 14:52 UTC

    Your use of a non-greedy quantifier isn't best here. You are already specifying \S and, since you are being specific, the non-greediness isn't really buying you anything. (In fact, it's somewhat less efficient.) You can also skip the capturing of space at the end. You are just re-adding it anyway, so just leave it alone to begin with. Your regex would be better written as:

    s!(http://\S+)!<a href="$1">$1</a>!g;
    And, you might as well catch https as well:
    s!(https?://\S+)!<a href="$1">$1</a>!g;

    -sauoq
    "My two cents aren't worth a dime.";