Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

replace url in text with html link

by thealienz1 (Pilgrim)
on May 24, 2005 at 15:20 UTC ( #460026=perlquestion: print w/replies, xml ) Need Help??

thealienz1 has asked for the wisdom of the Perl Monks concerning the following question:

Before anyone says anything I have looked through the monastary looking for code that already does what the title describes. I know about here, which has been useful, but does not work under my circumstances.

Please hear me out first.

I have put assigned to put together an html version of a PDF document. Why? I do not know, but the PDF document is too comples to just directly save as html. Anyways, I have put the document together into html, but alot of the URLs that are written in the text are not html clickable links.

I have tried the solutions at here, but they yield results to be questioned. You see the URLs in the document are not directly formatted to any specification -- meaning they are directly written into a sentence as if it were a word.

For example:

Students can activate their UMSIS userids on-line by filling out and submitting the form at https://umsis.miami.edu/sign-up.
...which can be obtained via the web at http://www.miami.edu/it-forms/.

Do you noticed the period at the end? Well all the solutions that I have for convert http:// into links do not account for this problem. I am trying to account for grammatical puncutation that a URL might encounter, but have not been able to come with a solution.

I was wondering if the community had any insight? I have never been really good as regexp, so forgive my ingonorance if this really easy.

Update

I just noticed also that the documents contain email address. In the same situation with http links with the whole being represented as a word not as a URL. If you can help with that... you would save me a lot of time. Considering the document if a few hundred pages.

Thank you

Replies are listed 'Best First'.
Re: replace url in text with html link
by davidrw (Prior) on May 24, 2005 at 15:33 UTC
    Can you post some code for solutions you've attempted so we can work from there? My initial thoughts are that you can just tweak the regex to exclude an optional trailing period. maybe something like:
    while(<DATA>){ s#(https?://\S+?)(\.?\s)#<a href="$1">$1</a>$2#igs; print; } __DATA__ Students can activate their UMSIS userids on-line by filling out and s +ubmitting the form at https://umsis.miami.edu/sign-up. ...which can be obtained via the web at http://www.miami.edu/it-forms/ +.
Re: replace url in text with html link
by ides (Deacon) on May 24, 2005 at 15:29 UTC

    Try this regex on your lines, whole file, etc.

    $string =~ s/(https?:\/\/.*)(?:\s|\.)/<a href="\1">\1<\/a>/oig;

    I may not be perfect depending on your data, but it should work in most cases.

    Frank Wiles <frank@wiles.org>
    http://www.wiles.org

Re: replace url in text with html link
by Roger (Parson) on May 24, 2005 at 15:40 UTC
    A very simple HTML link replacement regex that ignores the last period (.):

    my $html = do { local $/; <DATA> }; $html =~ s!(https*:\S*?)(\.*?\s)!<a href="$1">$1</a>$2!gm; print $html; __DATA__ Students can activate their UMSIS userids on-line by filling out and s +ubmitting the form at https://umsis.miami.edu/sign-up. ...which can be obtained via the web at http://www.miami.edu/it-forms/ +.


      This might be nitpicky, but perhaps (hopefully) educational for the OP. Caught my eye because we actually had almost identical responses... (mine is here)
      • I had http? and you had https* -- yours would match httpssss://blah.com since the * is zero more instead of ? being 0 or 1
      • I have ://\S+? and you have :\S*? -- yours would match http:. or http: or http:/blah or http:blah.com
      • I have (\.?\s) and you have (\.*?\s) -- I think yours is better here, but i'm sure how the two non-greedy (one for \S and one for \.) work together in the case of http://blah.com...
Re: replace url in text with html link
by thealienz1 (Pilgrim) on May 24, 2005 at 16:03 UTC

    My final soultion involved modifying yours to incoporate other punctuation.

    $output =~ s#(https?://\S+?)(\.?[\s\<,])#<a href="$1">$1</a>$2#igs;

    Thank you for your quick repsonses.

Re: replace url in text with html link
by TedPride (Priest) on May 24, 2005 at 16:30 UTC
    With a few hours, I could probably make the regex recognize URLs in any format, but absolute URLs will have to do. Try the following:
    use strict; use warnings; $_ = join '',<DATA>,' '; s/(\w+:\/\/.*?)(\.?\s|\)|\])/<a href="$1">$1<\/a>$2/ig; chop; print; __DATA__ Students can activate their UMSIS userids on-line by filling out and s +ubmitting the form at https://umsis.miami.edu/sign-up. ...which can be obtained via the web at http://www.miami.edu/it-forms/ +.

Re: replace url in text with html link
by PodMaster (Abbot) on May 25, 2005 at 11:26 UTC

      But, I did. If you refer to my original message I do say reference the node that to specify. I did try all solutions even the URI::Find, and none worked correctly.

      So, (ID)RTFM!

      ID = I Did

        Gee, I wonder why it works for me (works extremely well) Must be lucky I guess :-)

        update: for emails, try Email::Find (didn't see that coming)

        MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
        I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
        ** The third rule of perl club is a statement of fact: pod is sexy.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://460026]
Approved by davidrw
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2019-10-19 07:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?