Welcome to the Monastery PerlMonks

### Re: Generate unique ids of maximum length

by ikegami (Pope)
 on Apr 12, 2010 at 18:53 UTC ( #834341=note: print w/replies, xml ) Need Help??

in reply to Generate unique ids of maximum length

```A2990_du_1
A2990_du_2
A2990_du_3
A2990_du_4
A2990_du_5
A2990_du_6
A2990_du_7
A2990_du_8
A2990_du_9
A2990_du_10
LXP_01
LXP_02
LXP_03
LXP_04
LXP_05
LXP_06
LXP_07
LXP_08
LXP_09
LXP_10
LXP_11
LXP_12
LXP_13
LXP_14
LXP_15
LXP_16
LXP_17
LXP_18
Len3_ca_A
Len3_ca_B
Len3_ca_C
Len3_du_1
Len3_du_2
Len3_du_3
Len5_ca_1
Len5_ca_2
Len5_ca_3
Len5_du_1
Len5_du_2
Len5_du_3
No_1
No_2
No_3
No_4
No_5
No_6

can be obtained using the following:

```use strict;
use warnings;

my \$p = \shift;
\$p = \( \$\$p->{\$_} ) for @_;
\$\$p->{''} = 1;
}

sub shorten_unsplit {
our \$fixed;   local *fixed   = \\$_[0];
our \$unsplit; local *unsplit = \\$_[1];
for (\$unsplit) {
if ( s/^([^A-Za-z]+[A-Za-z]?)// ) { \$fixed .= \$1; redo; }
if ( s/^(?=(.))[A-Z]*[a-z]*//s  ) { \$fixed .= \$1; redo; }
}
}

sub shorten {
my @results;
local *helper = sub {
my (\$trie, \$fixed, \$unsplit) = @_;
my \$single = ( keys(%\$trie) == 1 );
shorten_unsplit(\$fixed, \$unsplit)
if !\$single || exists(\$trie->{''});
for my \$key (
sort {; no warnings 'numeric'; \$a <=> \$b || \$a cmp \$b }
keys(%\$trie)
) {
if (\$key eq '') {
push @results, \$fixed;
} elsif (\$single) {
helper(\$trie->{\$key}, \$fixed, "\$unsplit\$key");
} else {
helper(\$trie->{\$key}, "\$fixed\$key", '');
}
}
};

my \$trie;
return if !\$trie;
helper(\$trie, '', '');
return @results;
}

{
chomp( my @data = <DATA> );
print("\$_\n") for shorten(@data);
}

__DATA__
...

The code can manipulated to remove less when desired.

Update: Fixed a bug.
Update: And another.
Update: Improved sorting (1,2,...,9,10 vs 1,10,2,...,9).
Update: Input strings can now be substrings of other input strings. Simplified code at the same time.

Replies are listed 'Best First'.
Re^2: Generate unique ids of maximum length
by lima1 (Curate) on Apr 13, 2010 at 12:00 UTC
Wow, nice code. Took me a while to get it, though!
Re^2: Generate unique ids of maximum length
by choroba (Chancellor) on Apr 13, 2010 at 12:24 UTC
However, A2990_duallayerA_1 is not shortened in any way... Nevertheless, nice code. Some comments would still be helpful for faster understanding.

To shorten "A2990_duallayerA_1" when "A2990_duallayer_1" is also present would require removing from the middle of the word, and that goes against your examples. You didn't specify your spec, so I had to guess a lot.

To handle this case, you could consider a lowercase followed by an uppercase to be a word break. Change

```            if (\$key =~ /^[a-zA-Z]\z/) {

to

```            if (  \$key =~ /^[a-z]\z/
|| \$key =~ /^[A-Z]\z/ && \$flux !~ /[a-z]\z/
) {

You get:

```A2990_duA_1
A2990_du_1
A2990_du_2
A2990_du_3
A2990_du_4
A2990_du_5
A2990_du_6
A2990_du_7
A2990_du_8
A2990_du_9
A2990_du_10
LXP_01
LXP_02
...

Fixed in original code.

Create A New User
Node Status?
node history
Node Type: note [id://834341]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2017-08-22 05:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (329 votes). Check out past polls.

Notices?