Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Detect file sequences in File:Find results

by aes1972 (Initiate)
on Jan 21, 2013 at 00:09 UTC ( #1014360=perlquestion: print w/ replies, xml ) Need Help??
aes1972 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am currently getting a list of files from a directory hiearchy with this script which works great (user name and full name get appended at end of file path):
use File::Find; use File::stat; my $dir = "path/to/directory"; find ( {' wanted' => sub { my $file = $File::Find::name; ($n, $p, $uig, $gid, $dq, $c, $fn, $d, $s) = getpwuid (stat($file)-> uid); if (-f && (/^[^.]/) ) { print $file.":".$n.":".$fn; } }, 'preprocess' => sub { @_ = map { $_->[0] } sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] } map { m/(\d+)(\.[^.]+$)/ ? [$_, $2.$`,int($1)] : [$_, "", ""] } @_; @_ = grep (/^[^\.]/, @_) } } ,$dir);'

The thing is that most of the files are file sequences like this:

image_sequenceA.1.tif image_sequenceA.2.tif image_sequenceA.3.tif image_sequenceB.40.tif image_sequenceB.41.tif image_sequenceB.42.tif


So I would love to be able to filter the results to only get one entry per sequence like this:

image_sequenceA.[1-3].tif image_sequenceB.[40-42].tif

Please note that the sequences dont have padded numbers but I am already solving that with the map and sort above so now I just need to only print the unique sequences.

Actually since I am getting the full path and also attaching the user the output should ultimately look like this:

path/to/directory/image_sequenceA.[1-3].tif:bob:Bob User path/to/directory/image_sequenceB.[40-42].tif:frank:Frank User


Thanks.

Comment on Detect file sequences in File:Find results
Select or Download Code
Re: Detect file sequences in File:Find results
by choroba (Abbot) on Jan 21, 2013 at 01:01 UTC
    What does the preprocess sub do? This is my solution without it:
    #!/usr/bin/perl use warnings; use strict; use File::Find; use File::stat; my $dir = shift; my %result; find ( {'wanted' => sub { my $file = $File::Find::name; my ($n, $fn) = (getpwuid (stat($file)-> uid))[0, 6]; if (-f && (/^[^.]/) ) { if( my ($pre, $num, $suff) = $file =~ /(.*)\.([0-9]+)\ +.(.*)/ ) { push @{ $result{$pre}{$suff}{"$n:$fn"} }, $num; } } }, }, $dir); for my $pre (keys %result) { for my $suff (keys %{ $result{$pre} }) { for my $user (keys %{ $result{$pre}{$suff} }) { my @nums = sort { $a <=> $b } @{ $result{$pre}{$suff}{$use +r} }; my $first = shift @nums; my ($from, $to, @ranges) = ($first, $first); for (@nums) { if ($_ == $to + 1) { $to = $_; } else { push @ranges, [$from, $to]; ($from, $to) = ($_, $_); } } push @ranges, [$from, $to]; for my $r (@ranges) { print "$pre." . ($r->[0] == $r->[1] ? $r->[0] : "[$r->[0]-$r->[1]]") . ".$suff:$user\n"; } } } }
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      --->What does the preprocess sub do?

      It sorts the directory results so that the unpadded sequences comes back in the correct order. Without it the sequence comes back looking like this:
      image_sequenceA.1.tif image_sequenceA.11.tif image_sequenceA.12.tif image_sequenceA.13.tif image_sequenceA.14.tif image_sequenceA.15.tif image_sequenceA.16.tif image_sequenceA.17.tif image_sequenceA.18.tif image_sequenceA.19.tif image_sequenceA.2.tif image_sequenceA.20.tif image_sequenceA.3.tif
      Which would make the sequence detection incorrect. It also handles the cases where there are files with the same name but different extension like this:

      image_sequenceA.1.tif image_sequenceA.2.tif image_sequenceA.3.tif image_sequenceA.1.exr image_sequenceA.2.exr image_sequenceA.3.exr
      So the preprocess sub is necessary in my case unless I can do the same sort somewhere else
        so that the unpadded sequences comes back in the correct order

        You may have a reason for the order you chose but I would have thought it made more sense order things so that the number part of the filename was sorted numerically, not lexically. That way you get two runs rather than one run and four singletons from the example you post.

        image_sequenceA.1.tif image_sequenceA.2.tif image_sequenceA.3.tif image_sequenceA.11.tif image_sequenceA.12.tif image_sequenceA.13.tif image_sequenceA.14.tif image_sequenceA.15.tif image_sequenceA.16.tif image_sequenceA.17.tif image_sequenceA.18.tif image_sequenceA.19.tif image_sequenceA.20.tif

        I hope this is of interest.

        Cheers,

        JohnGG

Re: Detect file sequences in File:Find results
by Anonymous Monk on Jan 21, 2013 at 08:00 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1014360]
Front-paged by Lotus1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-09-20 18:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (160 votes), past polls