Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Bolding search terms ... which might be URLs?

by Anonymous Monk
on Mar 11, 2011 at 08:33 UTC ( #892619=note: print w/ replies, xml ) Need Help??

Comment on Re^2: Bolding search terms ... which might be URLs?
Re^3: Bolding search terms ... which might be URLs?
by mr_mischief (Monsignor) on Mar 11, 2011 at 08:59 UTC

    Thanks for the pointers. Those still cannot deal with every case properly for arbitrary text. It's not a matter of getting the code right. It's a matter of there being too little information in the arbitrary text to be sure how to mark it up.

    A valid URI can easily be formed with a comma, semicolon, colon, question mark, or period at the end of it. They are often not the URI intended, though, as people use English punctuation around their URIs without separating them. There are important differences between the URI with and without those characters in some cases.

    The manual for the first one you list punts on non-Latin characters, too. Regexp::Common::URI::ftp's docs state that there's no well-defined standard across the RFCs for an FTP URI. You can get closer and closer, but you're just not going to get 100%. The only way to be sure you've marked something up entirely properly with URIs is to visit the URI and make sure the expected content is delivered.

    According to the RFCs, a URI such as http://foo.com does not necessarily even need to redirect to the resource http://foo.com/ if the owner of th site doesn't wish it to. You just can't be sure with arbitrary text and no markup that you are introducing links correctly all the time.

      Thanks very much indeed for that work, Mr Mischief. I appreciate it hugely. Sorry I haven't been back to this thread for a while. You've been really helpful. For what it's worth, my users are very unlikely to post edge-case URLs like the ones discussed here, or non-ASCII domain names.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://892619]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2014-11-28 07:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (193 votes), past polls