Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: Generate unique ids of maximum length

by rubasov (Friar)
on Apr 13, 2010 at 22:22 UTC ( #834589=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Generate unique ids of maximum length
in thread Generate unique ids of maximum length

Thanks for the explanation, I've got it. (Then consider my node as an explanation to ikegami's node.)

It's funny how many different ways people interpret similarity/resemblance, because that was exactly the reason why I chose to keep all the optional characters if the id already fit in the char limit. That way my code always keeps the under-limit ids identical (== more similar). Of course in other cases that's not the optimal choice.

I also thought of (but not implemented) a more generic way to decide which character to drop from the original id: provide the user a filter callback in which s?he can rate the characters (or substrings) considered, then drop the ones with the lowest rating (still from right to left). For example: [_ ] => 3, [A-Z] => 2, [a-z]=> 1, anything else => 0

And this is why I've collapsed the char-level suffix tree to the substring-level: to ease the access to substrings for the purpose of rating. And also because the structure of the tree in the substring-level form cannot interfere with the selection of (non-)ambiguous characters (as in choroba's remark above if I get it right).

Cheers


Comment on Re^3: Generate unique ids of maximum length
Download Code
Re^4: Generate unique ids of maximum length
by ikegami (Pope) on Apr 13, 2010 at 22:28 UTC

    And this is why I've collapsed the char-level suffix tree to the substring-level:

    I've always done that too, for exactly the reason you mentioned. I just don't create a tree from the collapsed sequences. I just keep the currently relevant collapsed sequence in a scalar (was called $flux, now called $unsplit).

    I contemplated returning each item as an alternating list of required and optional components (as follows), but I wanted to keep the code a short as possible.

    ( ... [ 'Le', 'noc', '3', '_', 'd', 'uallayer_', '3' ], [ 'Le', 'noc', '5', '_', 'c', 'arina_', '1' ], ... )

    Update: Added last para and accompanying illuatration.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://834589]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2014-04-21 10:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (492 votes), past polls