Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Reg expression to replace URLs with Anchor tags in Tweets

by halfbaked (Sexton)
on Apr 18, 2009 at 23:06 UTC ( #758524=perlquestion: print w/ replies, xml ) Need Help??
halfbaked has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a simple script to view Twitter tweets but I'm having trouble converting URLs (http://...) into HTML anchors.

I need a pure Perl solution, I can't use a package, it must be a reg exp.

Here's what I have so far, clearly not working.
#!/usr/bin/perl $tweet = 'Mayor Nickels leads the pack as Seattle campaign contributio +ns break $1 million mark http://tinyurl.com/ckpxmu #seattle #seattlen +ews'; $_ = $tweet; s/(http.*)/<a href=\"\1\">\1<\/a>/; print $_ . "\n";
I need the regular expression to stop when it hits white space, in this example not including the hashtags (#seattle #seattlenews).

I just don't know how to do that. Needless to say my regular expression knowledge is limited to only the most basic operations.

Thanks,
Keith

Comment on Reg expression to replace URLs with Anchor tags in Tweets
Download Code
Re: Reg expression to replace URLs with Anchor tags in Tweets
by Bloodnok (Vicar) on Apr 18, 2009 at 23:55 UTC
    Your regexp looks like how you might accomplish the task using sed(1), capture strings in perl are referenced as $n in the substitution - where n is 1..9.

    Your regexp is capturing everything from the 'http' string, hence your regexp:

    s/(http.*)/<a href=\"\1\">\1<\/a>/;
    should read:
    s/(http\S+)/<a href="$1">$1<\/a>/;
    In that way, the regexp will capture everything from the 'http' string to (but not including) the next whitespace char.

    See perlre &/or perlretut.

    A user level that continues to overstate my experience :-))
      Thanks Bloodnok, that's exactly what I was looking for.

      I want to avoid using a package to solve this problem because I have to do the same thing in some PHP code. I don't know of a PHP module that can do what I need, preg_replace will work using Perl regular expressions.

      This is one of those things I've been wanting to solve for awhile but kept putting it off.

      Thanks again,
      Keith
Re: Reg expression to replace URLs with Anchor tags in Tweets
by kyle (Abbot) on Apr 18, 2009 at 23:58 UTC
Re: Reg expression to replace URLs with Anchor tags in Tweets
by merlyn (Sage) on Apr 18, 2009 at 23:58 UTC
    See URI::Find, as illustrated in my column on "poor man's web chat".

    And no, you can't say "can't use a package" here. There are plenty of nodes about every possible workaround for that.

    Unless... this isn't really a Perl question at all, in which case, shame on you.

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Re: Reg expression to replace URLs with Anchor tags in Tweets
by GrandFather (Cardinal) on Apr 19, 2009 at 00:13 UTC

    Yes, even you can use CPAN. For tricky stuff, even if you don't use the code, it's well worth at least taking a look at the prior art!


    True laziness is hard work
      Not for this one, but thanks. See above reply. Code reuse between Perl and PHP. ;-)

        Oh, it wasn't clear that you were asking a php question so it wasn't clear that Perl answers were not appropriate - which is odd given that this is a Perl site. Perhaps you forgot to mark your question as off topic?


        True laziness is hard work
Re: Reg expression to replace URLs with Anchor tags in Tweets
by ikegami (Pope) on Apr 19, 2009 at 01:50 UTC
    There is no difference between a Perl script and a Perl module, so any answer we give you would violate your criteria.
Re: Reg expression to replace URLs with Anchor tags in Tweets
by ELISHEVA (Prior) on Apr 19, 2009 at 05:48 UTC

    Perl regular expressions and PHP's "perl compatible" regular expressions are not exactly alike and the differences may bite you if you aren't careful. For a list of differences, study Differences from Perl.

    Best, beth

Re: Reg expression to replace URLs with Anchor tags in Tweets
by CountZero (Bishop) on Apr 19, 2009 at 07:46 UTC
    Bloodnok's solution will work, but will break if someone will use 'HTTP' (all caps) or if the term 'http' is used on its own.

    To make it a little bit safer, perhaps the following can be considered:

    s{(https?://\S+)}{<a href="$1">$1</a>}i;
    However, if you really want to make sure that it is a well-crafted URL, you should explore Regexp::Common.

    As a penance for your sin of asking a PHP question in our Perl Monastery, you are advised to implement Regexp::Common in PHP so the poor misguided souls using PHP will see the Glory of Perl!

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Reg expression to replace URLs with Anchor tags in Tweets
by halfbaked (Sexton) on Apr 19, 2009 at 19:17 UTC
    Here's my Twitter page using the regex (albeit with PHP) here.

    The Perl implementation is a backend process, so no UI to look at.

    Thanks again.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://758524]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-12-25 16:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls