Re^2: Collapsing a list of files...

Ha! No, I don't want a single * and, yes, I could have done a better job of articulating the constraints.

Here's the deal. Each of the files in that original list of 5,000 will have a prefix of 3 characters, a middle pattern of [\w+\d]*[l|n]n, and a suffix of n digits. Now that I think about it, you can ignore the suffix altogether and just replace the n digits with *. So all we need to focus on is the prefix and middle pattern. Now I need to do this:-

1. Fracture up the start list into n new lists, where n is the number of unique prefixes. So if n = 3, the sum total of the file names in the 3 lists will equal 5,000.

2. Now take each list from #1 and reduce each one down to the fewest number of unique patterns that end in *. The prefix must be preserved, as well as, at a minimum, the first character of the middle string.

3. There can only be one *, and it has to be at the end of each pattern. The original example should make this clear. (Again, it closely resembles the actual file names and the outcome I want.)

Hopefully this helps!

Thanks,

fiddler42

Comment on Re^2: Collapsing a list of files... Download Code

Replies are listed 'Best First'.
Re^3: Collapsing a list of files... by Anonymous Monk on Sep 17, 2009 at 00:35 UTC
Maybe you can adopt this, I call it floyd #!/usr/bin/perl -- use strict; use warnings; my(@list ) = sort qw[ abcpwbln08 abcpwbln07 abcpwbnn10 abcpwbnn06 adesicln09 adesicln08 adesicln01 adesicnn05 ]; use Data::Dumper; my $len = length 'abcpwbnn'; my $ret = floyd( \@list, $len, ) ; print Dumper($ret),"\n"; for my $ar( @$ret ){ if($#{$ar} > 0){ print substr $ar->[0], 0, $len; print "\n"; } else { print $ar->[0],"\n"; } } sub floyd { my ( $input, $len ) = @_; my %output; keys %output = scalar @$input; #presize %output, max possible my @keys; my $num_keys_l = 0; for my $i( @$input ){ my $ikey = substr $i, 0, $len; push @{$output{$ikey}}, $i; my $nkeys = 0+ keys %output; if( $nkeys > $num_keys_l ){ $num_keys_l = $nkeys; push @keys, $ikey; } } return [ @output{@keys} ]; } __END__ $VAR1 = [ [ 'abcpwbln07', 'abcpwbln08' ], [ 'abcpwbnn06', 'abcpwbnn10' ], [ 'adesicln01', 'adesicln08', 'adesicln09' ], [ 'adesicnn05' ] ]; abcpwbln abcpwbnn* adesicln* adesicnn05 [download]	[reply] [d/l]
Re^3: Collapsing a list of files... by graff (Chancellor) on Sep 17, 2009 at 03:53 UTC
Hopefully this helps! Not entirely, perhaps... ... a suffix of n digits. Now that I think about it, you can ignore the suffix altogether and just replace the n digits with .* Well, I'm assuming there must be some other constraint (which you haven't described yet) that would make this an inadequate solution: `my %collapsed; opendir( D, "/whatever/path" ) or die "whatever: $!\n"; for my $name ( grep /\d+$/, readdir D ) { $name =~ s/\d+$//; $collapsed{$name}++; } for my $name ( sort keys %collapsed ) { print "$name\n"; }` [download] If you're trying to do something else besides ignoring the final digits -- if there's supposed to be some actual structure to the file names and you need to do something with that besides adding "" at the end -- let us in on it. From what you've described so far, it's either too simple or too vague.	[reply] [d/l]
Re^3: Collapsing a list of files... by kennethk (Abbot) on Sep 17, 2009 at 15:07 UTC
I think you are still a little caught up in the source of your data, and not considering it purely from a string processing perspective. From my reading, your goals are: Determine the number of unique 4-character prefixes (n + 1, n == 3) within a list of strings. For each prefix group, determine the longest possible string anchored at the string start common between all members of the group If the second goal is incorrect (you've never stated it but it seems to pervade your solutions), your goal can be achieved very simply with: `#!/usr/bin/perl use strict; use warnings; my %files = (); while (<DATA>) { if (/(.{4})/) { $files{$1} = 1; } } foreach my $prefix (keys %files) { print "$prefix\n"; } __DATA__ abcpwbln08 abcpwbln07 abcpwbnn10 abcpwbnn06 adesicln09 adesicln08 adesicln01 adesicnn05` [download] If you want to keep the second goal, one way to accomplish your goal (TIMTOWTDI) would be to compare the new postfix with the existing solution for that prefix, and shorten that guess until it matches the to criterion: `!/usr/bin/perl use strict; use warnings; my %files = (); while (<DATA>) { if (my ($prefix,$postfix) = /(.{4})(.)/) { if (exists $files{$prefix}) { my $pattern = $files{$prefix}; while (not $postfix =~ /^$pattern/) { $pattern =~ s/.$//; } $files{$prefix} = $pattern; } else { $files{$prefix} = $postfix; } } } foreach my $prefix (keys %files) { print "$prefix$files{$prefix}*\n"; } __DATA__ abcpwbln08 abcpwbln07 abcpwbnn10 abcpwbnn06 adesicln09 adesicln08 adesicln01 adesicnn05` [download]	[reply] [d/l] [select]


Welcome to the Monastery
	PerlMonks