Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Collapsing a list of files...

by kennethk (Monsignor)
on Sep 16, 2009 at 22:25 UTC ( #795722=note: print w/ replies, xml ) Need Help??


in reply to Collapsing a list of files...

You need to consider your spec a little more, since the trivial answer for your question is:

*

which I assume is not what you want. Once you've considered what you want in terms of your constraints, modify the spec and I'm sure you'll get some more helpful suggestions.

This also sounds like a case where some Dynamic_programming would get you a long way.


Comment on Re: Collapsing a list of files...
Download Code
Re^2: Collapsing a list of files...
by fiddler42 (Beadle) on Sep 17, 2009 at 00:22 UTC
    Ha! No, I don't want a single * and, yes, I could have done a better job of articulating the constraints.

    Here's the deal. Each of the files in that original list of 5,000 will have a prefix of 3 characters, a middle pattern of [\w+\d]*[l|n]n, and a suffix of n digits. Now that I think about it, you can ignore the suffix altogether and just replace the n digits with *. So all we need to focus on is the prefix and middle pattern. Now I need to do this:-

    1. Fracture up the start list into n new lists, where n is the number of unique prefixes. So if n = 3, the sum total of the file names in the 3 lists will equal 5,000.

    2. Now take each list from #1 and reduce each one down to the fewest number of unique patterns that end in *. The prefix must be preserved, as well as, at a minimum, the first character of the middle string.

    3. There can only be one *, and it has to be at the end of each pattern. The original example should make this clear. (Again, it closely resembles the actual file names and the outcome I want.)

    Hopefully this helps!

    Thanks,

    fiddler42

      Maybe you can adopt this, I call it floyd
      #!/usr/bin/perl -- use strict; use warnings; my(@list ) = sort qw[ abcpwbln08 abcpwbln07 abcpwbnn10 abcpwbnn06 adesicln09 adesicln08 adesicln01 adesicnn05 ]; use Data::Dumper; my $len = length 'abcpwbnn'; my $ret = floyd( \@list, $len, ) ; print Dumper($ret),"\n"; for my $ar( @$ret ){ if($#{$ar} > 0){ print substr $ar->[0], 0, $len; print "*\n"; } else { print $ar->[0],"\n"; } } sub floyd { my ( $input, $len ) = @_; my %output; keys %output = scalar @$input; #presize %output, max possible my @keys; my $num_keys_l = 0; for my $i( @$input ){ my $ikey = substr $i, 0, $len; push @{$output{$ikey}}, $i; my $nkeys = 0+ keys %output; if( $nkeys > $num_keys_l ){ $num_keys_l = $nkeys; push @keys, $ikey; } } return [ @output{@keys} ]; } __END__ $VAR1 = [ [ 'abcpwbln07', 'abcpwbln08' ], [ 'abcpwbnn06', 'abcpwbnn10' ], [ 'adesicln01', 'adesicln08', 'adesicln09' ], [ 'adesicnn05' ] ]; abcpwbln* abcpwbnn* adesicln* adesicnn05
      Hopefully this helps!

      Not entirely, perhaps...

      ... a suffix of n digits. Now that I think about it, you can ignore the suffix altogether and just replace the n digits with *.

      Well, I'm assuming there must be some other constraint (which you haven't described yet) that would make this an inadequate solution:

      my %collapsed; opendir( D, "/whatever/path" ) or die "whatever: $!\n"; for my $name ( grep /\d+$/, readdir D ) { $name =~ s/\d+$/*/; $collapsed{$name}++; } for my $name ( sort keys %collapsed ) { print "$name\n"; }
      If you're trying to do something else besides ignoring the final digits -- if there's supposed to be some actual structure to the file names and you need to do something with that besides adding "*" at the end -- let us in on it. From what you've described so far, it's either too simple or too vague.
      I think you are still a little caught up in the source of your data, and not considering it purely from a string processing perspective. From my reading, your goals are:

      1. Determine the number of unique 4-character prefixes (n + 1, n == 3) within a list of strings.
      2. For each prefix group, determine the longest possible string anchored at the string start common between all members of the group

      If the second goal is incorrect (you've never stated it but it seems to pervade your solutions), your goal can be achieved very simply with:

      #!/usr/bin/perl use strict; use warnings; my %files = (); while (<DATA>) { if (/(.{4})/) { $files{$1} = 1; } } foreach my $prefix (keys %files) { print "$prefix*\n"; } __DATA__ abcpwbln08 abcpwbln07 abcpwbnn10 abcpwbnn06 adesicln09 adesicln08 adesicln01 adesicnn05

      If you want to keep the second goal, one way to accomplish your goal (TIMTOWTDI) would be to compare the new postfix with the existing solution for that prefix, and shorten that guess until it matches the to criterion:

      !/usr/bin/perl use strict; use warnings; my %files = (); while (<DATA>) { if (my ($prefix,$postfix) = /(.{4})(.*)/) { if (exists $files{$prefix}) { my $pattern = $files{$prefix}; while (not $postfix =~ /^$pattern/) { $pattern =~ s/.$//; } $files{$prefix} = $pattern; } else { $files{$prefix} = $postfix; } } } foreach my $prefix (keys %files) { print "$prefix$files{$prefix}*\n"; } __DATA__ abcpwbln08 abcpwbln07 abcpwbnn10 abcpwbnn06 adesicln09 adesicln08 adesicln01 adesicnn05

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://795722]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (15)
As of 2014-12-26 18:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (174 votes), past polls