Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Imploding URLs

by tall_man (Parson)
on Jun 09, 2005 at 22:22 UTC ( #465344=note: print w/replies, xml ) Need Help??


in reply to Imploding URLs

You could use String::Ediff to find common substrings between pairs of URLs, and then break those down into pieces that are 31 characters or less and count those with a hash.

It uses a suffix tree to find the substrings, so it should be fairly efficient. Out-of-the-box, it finds substrings of length >=4, but that could probably be changed. Substrings of length one would not be compressed, anyway.

Update: You might prefer Algorithm::Diff, which has a nicer interface and more options.

Update2: The node Re: finding longest common substring also builds a suffix tree and it might be adaptable to your problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://465344]
help
Chatterbox?
[atcroft]: I thought I saw them on, and just wanted to let them know.
[LanX]: hmm ... or even better /msg /msg [discipulus]
LanX better!
[atcroft]: .oO(Plus it was a little slow for a few minutes, at least... ;) )

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2018-02-24 20:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When it is dark outside I am happiest to see ...














    Results (311 votes). Check out past polls.

    Notices?