Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^3: Generate unique ids of maximum length

by rubasov (Friar)
on Apr 13, 2010 at 22:22 UTC ( #834589=note: print w/replies, xml ) Need Help??

in reply to Re^2: Generate unique ids of maximum length
in thread Generate unique ids of maximum length

Thanks for the explanation, I've got it. (Then consider my node as an explanation to ikegami's node.)

It's funny how many different ways people interpret similarity/resemblance, because that was exactly the reason why I chose to keep all the optional characters if the id already fit in the char limit. That way my code always keeps the under-limit ids identical (== more similar). Of course in other cases that's not the optimal choice.

I also thought of (but not implemented) a more generic way to decide which character to drop from the original id: provide the user a filter callback in which s?he can rate the characters (or substrings) considered, then drop the ones with the lowest rating (still from right to left). For example: [_ ] => 3, [A-Z] => 2, [a-z]=> 1, anything else => 0

And this is why I've collapsed the char-level suffix tree to the substring-level: to ease the access to substrings for the purpose of rating. And also because the structure of the tree in the substring-level form cannot interfere with the selection of (non-)ambiguous characters (as in choroba's remark above if I get it right).


Replies are listed 'Best First'.
Re^4: Generate unique ids of maximum length
by ikegami (Pope) on Apr 13, 2010 at 22:28 UTC

    And this is why I've collapsed the char-level suffix tree to the substring-level:

    I've always done that too, for exactly the reason you mentioned. I just don't create a tree from the collapsed sequences. I just keep the currently relevant collapsed sequence in a scalar (was called $flux, now called $unsplit).

    I contemplated returning each item as an alternating list of required and optional components (as follows), but I wanted to keep the code a short as possible.

    ( ... [ 'Le', 'noc', '3', '_', 'd', 'uallayer_', '3' ], [ 'Le', 'noc', '5', '_', 'c', 'arina_', '1' ], ... )

    Update: Added last para and accompanying illuatration.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://834589]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2018-06-24 22:47 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.