Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^2: Generate unique ids of maximum length

by ikegami (Pope)
on Apr 13, 2010 at 20:52 UTC ( #834580=note: print w/replies, xml ) Need Help??

in reply to Re: Generate unique ids of maximum length
in thread Generate unique ids of maximum length

This tree structure seems similar to ikegami's code, however I haven't understood that fully yet, so I'm not sure what's the difference.

My tree is the same as your $ctree:

A -- ... / \ X -- P -- ... \ / L \ e -- n -- o -- c -- ...

except numbers are considered atomic in mine.

I don't bother collapsing into your $stree form:

A -- ... / \ XP -- ... \ / L \ enoc -- ...

You must type the marked characters, but the others are optional

The difference with mine is that I made more character mandatory. The rational is that the OP wanted to the result to resemble the original as much as possible.

Your mandatory characters:

  • Ambiguous characters.
Lenoc3_duallayer_1 -> Le3d1 ^^ ^ ^ ^

My mandatory characters:

  • Ambiguous characters.
  • A sequence of 0+ uppercase letters followed by a sequence of 0+ lowercase letters preceding an ambiguous lowercase letter.
  • First and second unambiguous character of a text sequence
  • Nonletters (digits, underscores)
Lenoc3_duallayer_1 -> Len3_du_1 ^^^ ^^^^ ^^

Replies are listed 'Best First'.
Re^3: Generate unique ids of maximum length
by rubasov (Friar) on Apr 13, 2010 at 22:22 UTC

    Thanks for the explanation, I've got it. (Then consider my node as an explanation to ikegami's node.)

    It's funny how many different ways people interpret similarity/resemblance, because that was exactly the reason why I chose to keep all the optional characters if the id already fit in the char limit. That way my code always keeps the under-limit ids identical (== more similar). Of course in other cases that's not the optimal choice.

    I also thought of (but not implemented) a more generic way to decide which character to drop from the original id: provide the user a filter callback in which s?he can rate the characters (or substrings) considered, then drop the ones with the lowest rating (still from right to left). For example: [_ ] => 3, [A-Z] => 2, [a-z]=> 1, anything else => 0

    And this is why I've collapsed the char-level suffix tree to the substring-level: to ease the access to substrings for the purpose of rating. And also because the structure of the tree in the substring-level form cannot interfere with the selection of (non-)ambiguous characters (as in choroba's remark above if I get it right).


      And this is why I've collapsed the char-level suffix tree to the substring-level:

      I've always done that too, for exactly the reason you mentioned. I just don't create a tree from the collapsed sequences. I just keep the currently relevant collapsed sequence in a scalar (was called $flux, now called $unsplit).

      I contemplated returning each item as an alternating list of required and optional components (as follows), but I wanted to keep the code a short as possible.

      ( ... [ 'Le', 'noc', '3', '_', 'd', 'uallayer_', '3' ], [ 'Le', 'noc', '5', '_', 'c', 'arina_', '1' ], ... )

      Update: Added last para and accompanying illuatration.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://834580]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2018-05-24 23:29 GMT
Find Nodes?
    Voting Booth?