Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Imploding URLs

by tall_man (Parson)
on Jun 09, 2005 at 22:22 UTC ( #465344=note: print w/ replies, xml ) Need Help??


in reply to Imploding URLs

You could use String::Ediff to find common substrings between pairs of URLs, and then break those down into pieces that are 31 characters or less and count those with a hash.

It uses a suffix tree to find the substrings, so it should be fairly efficient. Out-of-the-box, it finds substrings of length >=4, but that could probably be changed. Substrings of length one would not be compressed, anyway.

Update: You might prefer Algorithm::Diff, which has a nicer interface and more options.

Update2: The node Re: finding longest common substring also builds a suffix tree and it might be adaptable to your problem.


Comment on Re: Imploding URLs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://465344]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2015-07-07 13:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls