Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Generate unique ids of maximum length

by ikegami (Pope)
on Apr 12, 2010 at 18:53 UTC ( #834341=note: print w/ replies, xml ) Need Help??


in reply to Generate unique ids of maximum length

A2990_du_1 A2990_du_2 A2990_du_3 A2990_du_4 A2990_du_5 A2990_du_6 A2990_du_7 A2990_du_8 A2990_du_9 A2990_du_10 LXP_01 LXP_02 LXP_03 LXP_04 LXP_05 LXP_06 LXP_07 LXP_08 LXP_09 LXP_10 LXP_11 LXP_12 LXP_13 LXP_14 LXP_15 LXP_16 LXP_17 LXP_18 Len3_ca_A Len3_ca_B Len3_ca_C Len3_du_1 Len3_du_2 Len3_du_3 Len5_ca_1 Len5_ca_2 Len5_ca_3 Len5_du_1 Len5_du_2 Len5_du_3 No_1 No_2 No_3 No_4 No_5 No_6

can be obtained using the following:

use strict; use warnings; sub add { my $p = \shift; $p = \( $$p->{$_} ) for @_; $$p->{''} = 1; } sub shorten_unsplit { our $fixed; local *fixed = \$_[0]; our $unsplit; local *unsplit = \$_[1]; for ($unsplit) { if ( s/^([^A-Za-z]+[A-Za-z]?)// ) { $fixed .= $1; redo; } if ( s/^(?=(.))[A-Z]*[a-z]*//s ) { $fixed .= $1; redo; } } } sub shorten { my @results; local *helper = sub { my ($trie, $fixed, $unsplit) = @_; my $single = ( keys(%$trie) == 1 ); shorten_unsplit($fixed, $unsplit) if !$single || exists($trie->{''}); for my $key ( sort {; no warnings 'numeric'; $a <=> $b || $a cmp $b } keys(%$trie) ) { if ($key eq '') { push @results, $fixed; } elsif ($single) { helper($trie->{$key}, $fixed, "$unsplit$key"); } else { helper($trie->{$key}, "$fixed$key", ''); } } }; my $trie; add($trie, /\d+|./sg) for @_; return if !$trie; helper($trie, '', ''); return @results; } { chomp( my @data = <DATA> ); print("$_\n") for shorten(@data); } __DATA__ ...

Useful test case: Add A2990_dualplayer_10.

The code can manipulated to remove less when desired.

Update: Fixed a bug.
Update: And another.
Update: Improved sorting (1,2,...,9,10 vs 1,10,2,...,9).
Update: Input strings can now be substrings of other input strings. Simplified code at the same time.


Comment on Re: Generate unique ids of maximum length
Select or Download Code
Re^2: Generate unique ids of maximum length
by lima1 (Curate) on Apr 13, 2010 at 12:00 UTC
    Wow, nice code. Took me a while to get it, though!
Re^2: Generate unique ids of maximum length
by choroba (Abbot) on Apr 13, 2010 at 12:24 UTC
    However, A2990_duallayerA_1 is not shortened in any way... Nevertheless, nice code. Some comments would still be helpful for faster understanding.

      To shorten "A2990_duallayerA_1" when "A2990_duallayer_1" is also present would require removing from the middle of the word, and that goes against your examples. You didn't specify your spec, so I had to guess a lot.

      To handle this case, you could consider a lowercase followed by an uppercase to be a word break. Change

      if ($key =~ /^[a-zA-Z]\z/) {

      to

      if ( $key =~ /^[a-z]\z/ || $key =~ /^[A-Z]\z/ && $flux !~ /[a-z]\z/ ) {

      You get:

      A2990_duA_1 A2990_du_1 A2990_du_2 A2990_du_3 A2990_du_4 A2990_du_5 A2990_du_6 A2990_du_7 A2990_du_8 A2990_du_9 A2990_du_10 LXP_01 LXP_02 ...

      Fixed in original code.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://834341]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (10)
As of 2014-09-16 17:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (39 votes), past polls