Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

How to recognize url in text and convert to hyperlink, unless already in anchor

by Anonymous Monk
on Oct 11, 2004 at 22:18 UTC ( #398316=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a form that the user is allowed to type plain text and/or HTML into. I need to recognize urls within this text that have not already been wrapped in an anchor <A HREF> tag, and then wrap them so they end up as hyperlinks. If already wrapped, I want to leave them alone.

I have an expression that does this for the whole string:

$myformtext =~ s!(http://[^\s]+)!<a href="$1">$1</a>!gi;
But can I somehow apply this only to the portion of the text that is not within an <A HREF> tag? Maybe split it or something? I'm not a perl expert...

20041011 Edit by ysth: add p and code tags

Replies are listed 'Best First'.
Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by pizza_milkshake (Monk) on Oct 11, 2004 at 23:19 UTC
    $ cat no_href.pl
    #!perl -l # ex: set ts=4: use strict; use warnings; use HTML::Parser; use URI; my (@tagstack, $BUF); sub start { # enter into tags my ($tag, $attr, $text) = @_; $tag .= " href" if ($tag eq "a" && defined $attr->{"href"}); push @tagstack, $tag; output($text); } sub end { # escape out of tags my ($tag, $text) = @_; shift @tagstack while (scalar @tagstack && $tagstack[0] ne $tag); shift @tagstack if scalar @tagstack; # actually nuke element we're + looking for output($text); } sub text { # handle everything inside and around tags my ($text) = @_; if (unlinked()) { # replace URLs with their linked equivalent if we're not withi +n a link $text =~ s{ \b(http://\S+) }{ "<a href=\"" . URI->new($1)->can +onical . "\">$1</a>" }gex; } output($text); } # are we inside a link right now? sub unlinked { return not scalar grep { /^a href$/ } @tagstack; } # add to output buffer sub output { $BUF .= shift @_; } # start code my $p = HTML::Parser->new( "start_h" => [ \&start, "tagname, attr, text" ] ,"end_h" => [ \&end, "tagname, text" ] ,"text_h" => [ \&text, "dtext" ] ); $p->parse(do{ local $/; <DATA> }); print $BUF; __DATA__ <a href="">http://linked1.com</a> <a style="" href='bob'>http://linked2.com</a> <a href="whatever">http://linked3.com</a> <a nolink>http://linked4.com</a> http://unlinked1 http://unlinked2.com
    $ perl no_href.pl
    <a href="">http://linked1.com</a> <a style="" href='bob'>http://linked2.com</a> <a href="whatever">http://linked3.com</a> <a nolink><a href="http://linked4.com/">http://linked4.com</a></a> <a href="http://unlinked1/">http://unlinked1</a> <a href="http://unlinked2.com/">http://unlinked2.com</a>

    perl -e"\$_=qq/nwdd\x7F^n\x7Flm{{llql0}qs\x14/;s/./chr(ord$&^30)/ge;print"

Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by skx (Parson) on Oct 11, 2004 at 22:54 UTC

    All you need to do is look for the links as you are doing, but make sure that the links are preceeded, and optionally followed, by whitespace.

    This will never by true for something inside an A tag.

    (Yes the real solution is to use a package from CPAN for recognising URLS, and parsing, but this is a hack on your hack).

    You could use the following:

    $myformtext =~ s!(\s)(http://\w.*?)(\s)!$1<a href="$2">$2</a>$3!gm;
    Steve
    ---
    steve.org.uk

      This will never by true for something inside an A tag.

      Except when it is of course, like here: bamb

Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by pingo (Hermit) on Oct 12, 2004 at 09:33 UTC
    Sounds like you need URI::Find, or am I missing something obvious?
Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by DrHyde (Prior) on Oct 12, 2004 at 08:51 UTC
    Do it in two stages. First unwrap any URLs that the user has already wrapped in <a ...> ... </a> tags. Then wrap all URLs in <a ...> ... </a> tags.

    Alternatively, don't allow your users to put the tags in in the first place! This makes it easier to protect yourself and your users against craziness involving some of the other attributes of the <a> tag, like target, onclick and so on.

Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by Anonymous Monk on Oct 13, 2004 at 11:07 UTC
    Use Regexp::Common:
    use Regexp::Common qw /URI/; my $re_a_tag = qr/<a\s+.*?>.*<\/a>/si ; my $html = q` some link: <a href="http://www.perl.com">www.perl.com</a> http://www.perlmonks.com by! `; my @chunks = split(/($re_a_tag)/si , $html) ; foreach my $chunks_i ( @chunks ) { next if $chunks_i =~ /$re_a_tag/ ; $chunks_i =~ s/($RE{URI}{HTTP})/<a href="$1">$1<\/a>/gsi ; } $html = join('' , @chunks) ; print "$html\n" ;
    Output:
    some link: <a href="http://www.perl.com">www.perl.com</a> <a href="http://www.perlmonks.com">http://www.perlmonks.com</a> by!
    Enjoy!

    By gmpassos

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://398316]
Approved by ysth
Front-paged by grinder
help
Chatterbox?
[Corion]: At least in my area, experience can beat most new tech anyway because most new tech is just a rehash of things abandoned in the 70s and we still have a lot of that ;)
[Corion]: choroba: Ah, that will be next Friday for $work ;) Drinking enough water is key ;))
[Corion]: But then, maybe that's just my experience with things, and maybe Hacker News is just people much younger who haven't seen more than one tech cycle...
[choroba]: s/water/bear/ and s/enough/too much/
[choroba]: beer
[choroba]: oh ok
[marto]: I crashed the ScotLUG Christmas night, having never actually been to ScotLUG
[choroba]: Corion yeah, I probably already told you about how Bjarne Stroustrup was asked whether he still watched the new tech trends and what really impressed him
[choroba]: His reply was "I watch them, but I haven't been impressed in the last 10 years. There's been nothing new". That was 2 years ago :)
[ambrus]: Ok, the docs is somewhat unclear. It does say that when an object is garbage collected, it will get cleaned up, and eventually can no longer get messages. It's not clear how long this takes, eg. I think it's kept alive until its queued events are handled

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2016-12-09 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:













    Results (150 votes). Check out past polls.