Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Generate unique ids of maximum length

by ikegami (Pope)
on Apr 12, 2010 at 18:53 UTC ( #834341=note: print w/ replies, xml ) Need Help??


in reply to Generate unique ids of maximum length

A2990_du_1 A2990_du_2 A2990_du_3 A2990_du_4 A2990_du_5 A2990_du_6 A2990_du_7 A2990_du_8 A2990_du_9 A2990_du_10 LXP_01 LXP_02 LXP_03 LXP_04 LXP_05 LXP_06 LXP_07 LXP_08 LXP_09 LXP_10 LXP_11 LXP_12 LXP_13 LXP_14 LXP_15 LXP_16 LXP_17 LXP_18 Len3_ca_A Len3_ca_B Len3_ca_C Len3_du_1 Len3_du_2 Len3_du_3 Len5_ca_1 Len5_ca_2 Len5_ca_3 Len5_du_1 Len5_du_2 Len5_du_3 No_1 No_2 No_3 No_4 No_5 No_6

can be obtained using the following:

use strict; use warnings; sub add { my $p = \shift; $p = \( $$p->{$_} ) for @_; $$p->{''} = 1; } sub shorten_unsplit { our $fixed; local *fixed = \$_[0]; our $unsplit; local *unsplit = \$_[1]; for ($unsplit) { if ( s/^([^A-Za-z]+[A-Za-z]?)// ) { $fixed .= $1; redo; } if ( s/^(?=(.))[A-Z]*[a-z]*//s ) { $fixed .= $1; redo; } } } sub shorten { my @results; local *helper = sub { my ($trie, $fixed, $unsplit) = @_; my $single = ( keys(%$trie) == 1 ); shorten_unsplit($fixed, $unsplit) if !$single || exists($trie->{''}); for my $key ( sort {; no warnings 'numeric'; $a <=> $b || $a cmp $b } keys(%$trie) ) { if ($key eq '') { push @results, $fixed; } elsif ($single) { helper($trie->{$key}, $fixed, "$unsplit$key"); } else { helper($trie->{$key}, "$fixed$key", ''); } } }; my $trie; add($trie, /\d+|./sg) for @_; return if !$trie; helper($trie, '', ''); return @results; } { chomp( my @data = <DATA> ); print("$_\n") for shorten(@data); } __DATA__ ...

Useful test case: Add A2990_dualplayer_10.

The code can manipulated to remove less when desired.

Update: Fixed a bug.
Update: And another.
Update: Improved sorting (1,2,...,9,10 vs 1,10,2,...,9).
Update: Input strings can now be substrings of other input strings. Simplified code at the same time.


Comment on Re: Generate unique ids of maximum length
Select or Download Code
Replies are listed 'Best First'.
Re^2: Generate unique ids of maximum length
by lima1 (Curate) on Apr 13, 2010 at 12:00 UTC
    Wow, nice code. Took me a while to get it, though!
Re^2: Generate unique ids of maximum length
by choroba (Canon) on Apr 13, 2010 at 12:24 UTC
    However, A2990_duallayerA_1 is not shortened in any way... Nevertheless, nice code. Some comments would still be helpful for faster understanding.

      To shorten "A2990_duallayerA_1" when "A2990_duallayer_1" is also present would require removing from the middle of the word, and that goes against your examples. You didn't specify your spec, so I had to guess a lot.

      To handle this case, you could consider a lowercase followed by an uppercase to be a word break. Change

      if ($key =~ /^[a-zA-Z]\z/) {

      to

      if ( $key =~ /^[a-z]\z/ || $key =~ /^[A-Z]\z/ && $flux !~ /[a-z]\z/ ) {

      You get:

      A2990_duA_1 A2990_du_1 A2990_du_2 A2990_du_3 A2990_du_4 A2990_du_5 A2990_du_6 A2990_du_7 A2990_du_8 A2990_du_9 A2990_du_10 LXP_01 LXP_02 ...

      Fixed in original code.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://834341]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (14)
As of 2015-07-31 10:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (276 votes), past polls