|
|
| Welcome to the Monastery | |
| PerlMonks |
Re: Imploding URLsby tall_man (Parson) |
| on Jun 09, 2005 at 18:22 UTC ( [id://465344]=note: print w/replies, xml ) | Need Help?? |
This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.
You could use String::Ediff to find common substrings between pairs of URLs, and then break those down into pieces that are 31 characters or less and count those with a hash.
It uses a suffix tree to find the substrings, so it should be fairly efficient. Out-of-the-box, it finds substrings of length >=4, but that could probably be changed. Substrings of length one would not be compressed, anyway. Update: You might prefer Algorithm::Diff, which has a nicer interface and more options. Update2: The node Re: finding longest common substring also builds a suffix tree and it might be adaptable to your problem.
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||