http://www.perlmonks.org?node_id=1014360

aes1972 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am currently getting a list of files from a directory hiearchy with this script which works great (user name and full name get appended at end of file path):
use File::Find; use File::stat; my $dir = "path/to/directory"; find ( {' wanted' => sub { my $file = $File::Find::name; ($n, $p, $uig, $gid, $dq, $c, $fn, $d, $s) = getpwuid (stat($file)-> uid); if (-f && (/^[^.]/) ) { print $file.":".$n.":".$fn; } }, 'preprocess' => sub { @_ = map { $_->[0] } sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] } map { m/(\d+)(\.[^.]+$)/ ? [$_, $2.$`,int($1)] : [$_, "", ""] } @_; @_ = grep (/^[^\.]/, @_) } } ,$dir);'

The thing is that most of the files are file sequences like this:

image_sequenceA.1.tif image_sequenceA.2.tif image_sequenceA.3.tif image_sequenceB.40.tif image_sequenceB.41.tif image_sequenceB.42.tif


So I would love to be able to filter the results to only get one entry per sequence like this:

image_sequenceA.[1-3].tif image_sequenceB.[40-42].tif

Please note that the sequences dont have padded numbers but I am already solving that with the map and sort above so now I just need to only print the unique sequences.

Actually since I am getting the full path and also attaching the user the output should ultimately look like this:

path/to/directory/image_sequenceA.[1-3].tif:bob:Bob User path/to/directory/image_sequenceB.[40-42].tif:frank:Frank User


Thanks.

Replies are listed 'Best First'.
Re: Detect file sequences in File:Find results
by choroba (Cardinal) on Jan 21, 2013 at 01:01 UTC
    What does the preprocess sub do? This is my solution without it:
    #!/usr/bin/perl use warnings; use strict; use File::Find; use File::stat; my $dir = shift; my %result; find ( {'wanted' => sub { my $file = $File::Find::name; my ($n, $fn) = (getpwuid (stat($file)-> uid))[0, 6]; if (-f && (/^[^.]/) ) { if( my ($pre, $num, $suff) = $file =~ /(.*)\.([0-9]+)\ +.(.*)/ ) { push @{ $result{$pre}{$suff}{"$n:$fn"} }, $num; } } }, }, $dir); for my $pre (keys %result) { for my $suff (keys %{ $result{$pre} }) { for my $user (keys %{ $result{$pre}{$suff} }) { my @nums = sort { $a <=> $b } @{ $result{$pre}{$suff}{$use +r} }; my $first = shift @nums; my ($from, $to, @ranges) = ($first, $first); for (@nums) { if ($_ == $to + 1) { $to = $_; } else { push @ranges, [$from, $to]; ($from, $to) = ($_, $_); } } push @ranges, [$from, $to]; for my $r (@ranges) { print "$pre." . ($r->[0] == $r->[1] ? $r->[0] : "[$r->[0]-$r->[1]]") . ".$suff:$user\n"; } } } }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      --->What does the preprocess sub do?

      It sorts the directory results so that the unpadded sequences comes back in the correct order. Without it the sequence comes back looking like this:
      image_sequenceA.1.tif image_sequenceA.11.tif image_sequenceA.12.tif image_sequenceA.13.tif image_sequenceA.14.tif image_sequenceA.15.tif image_sequenceA.16.tif image_sequenceA.17.tif image_sequenceA.18.tif image_sequenceA.19.tif image_sequenceA.2.tif image_sequenceA.20.tif image_sequenceA.3.tif
      Which would make the sequence detection incorrect. It also handles the cases where there are files with the same name but different extension like this:

      image_sequenceA.1.tif image_sequenceA.2.tif image_sequenceA.3.tif image_sequenceA.1.exr image_sequenceA.2.exr image_sequenceA.3.exr
      So the preprocess sub is necessary in my case unless I can do the same sort somewhere else
        so that the unpadded sequences comes back in the correct order

        You may have a reason for the order you chose but I would have thought it made more sense order things so that the number part of the filename was sorted numerically, not lexically. That way you get two runs rather than one run and four singletons from the example you post.

        image_sequenceA.1.tif image_sequenceA.2.tif image_sequenceA.3.tif image_sequenceA.11.tif image_sequenceA.12.tif image_sequenceA.13.tif image_sequenceA.14.tif image_sequenceA.15.tif image_sequenceA.16.tif image_sequenceA.17.tif image_sequenceA.18.tif image_sequenceA.19.tif image_sequenceA.20.tif

        I hope this is of interest.

        Cheers,

        JohnGG

Re: Detect file sequences in File:Find results
by Anonymous Monk on Jan 21, 2013 at 08:00 UTC