Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Regex Query

by DanielSpaniel (Scribe)
on Aug 26, 2013 at 20:41 UTC ( #1051009=perlquestion: print w/ replies, xml ) Need Help??
DanielSpaniel has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to create what I thought should be a rather simple regex, but I seem to be having all kinds of problems with it (due in part, maybe, to an absence from Perl for a while)

Anyway, I'm trying to identify, and then alter, URLs in given strings. The URLs and the strings will vary daily, and in quality of formatting. The URLs could be anything at all, but they are just plain URLs (i.e. no HTML tags).

There may be more than one URL in a string, and the strings may contain both http and/or https URLs.

The URLs might be followed by any character, so it's not necessarily easy to figure that bit out. The character following the URL could just as easily be a misplaced quotation mark which doesn't even belong there, or it could be a space, or new line character, etc.

For example, a string might look like any of these (among other possibilities):

a) black and white stuff http://test.com/testing" blah blah b) rain in spain blah blah https://chewu.to/x7w c) http://udsu.de/823; test this d) Just testing ... http://go.to/xi8jwe #goodtimes e) Super dooper. Looks nice! http://22.com/xx / http://p.de

I've played with numerous variations of this regex, but the latest incarnation, which doesn't really work very well, is below:

$string=~s#http://(.*)(\s)#<a href=$1">http://$1</a>$2#g;

As can be seen, I'm trying to create the proper anchor tags to go with the given URL in the string, to create a proper link. The regex above works for very simple examples, but nothing more complex. i.e. it would work on example (d) above, but nothing else.

Any assistance would be much appreciated!

Comment on Regex Query
Select or Download Code
Re: Regex Query
by rminner (Hermit) on Aug 26, 2013 at 23:43 UTC
    perhaps Regexp::Common is what you are looking for:
    use strict; use warnings; use Regexp::Common qw /URI/; my $http_and_https = qr{$RE{URI}{HTTP}{-scheme=>'https?'}}; while (my $line = <DATA>) { while ($line =~ m#($http_and_https)#gc) { print $1 , "\n"; } } __DATA__ a) black and white stuff http://test.com/testing" blah blah b) rain in spain blah blah https://chewu.to/x7w c) http://udsu.de/823; test this d) Just testing ... http://go.to/xi8jwe #goodtimes e) Super dooper. Looks nice! http://22.com/xx / http://p.de
    Output:
    http://test.com/testing https://chewu.to/x7w http://udsu.de/823; http://go.to/xi8jwe http://22.com/xx http://p.de

      Hey, thanks very much for the suggestion.

      I'd not really planned on using a module to help with it, but it makes no difference if I do, so I'll give that a shot as soon as I get a chance and report back!

      Thank you again, I'm sure I can work with your suggestion.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1051009]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (12)
As of 2014-07-30 06:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls