Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re^2: Bolding search terms ... which might be URLs?

by Anonymous Monk
on Mar 11, 2011 at 08:33 UTC ( #892619=note: print w/replies, xml ) Need Help??

in reply to Re: Bolding search terms ... which might be URLs?
in thread Bolding search terms ... which might be URLs?

URI::Find, mod://Regexp::Common::URI]
  • Comment on Re^2: Bolding search terms ... which might be URLs?

Replies are listed 'Best First'.
Re^3: Bolding search terms ... which might be URLs?
by mr_mischief (Monsignor) on Mar 11, 2011 at 08:59 UTC

    Thanks for the pointers. Those still cannot deal with every case properly for arbitrary text. It's not a matter of getting the code right. It's a matter of there being too little information in the arbitrary text to be sure how to mark it up.

    A valid URI can easily be formed with a comma, semicolon, colon, question mark, or period at the end of it. They are often not the URI intended, though, as people use English punctuation around their URIs without separating them. There are important differences between the URI with and without those characters in some cases.

    The manual for the first one you list punts on non-Latin characters, too. Regexp::Common::URI::ftp's docs state that there's no well-defined standard across the RFCs for an FTP URI. You can get closer and closer, but you're just not going to get 100%. The only way to be sure you've marked something up entirely properly with URIs is to visit the URI and make sure the expected content is delivered.

    According to the RFCs, a URI such as does not necessarily even need to redirect to the resource if the owner of th site doesn't wish it to. You just can't be sure with arbitrary text and no markup that you are introducing links correctly all the time.

      Thanks very much indeed for that work, Mr Mischief. I appreciate it hugely. Sorry I haven't been back to this thread for a while. You've been really helpful. For what it's worth, my users are very unlikely to post edge-case URLs like the ones discussed here, or non-ASCII domain names.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://892619]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2018-06-22 02:26 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (120 votes). Check out past polls.