Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Group file per time interval

by raiten (Acolyte)
on Feb 07, 2013 at 22:37 UTC ( [id://1017732]=perlquestion: print w/replies, xml ) Need Help??

raiten has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I'm looking for a way to parse a directory and group files which have a create date (or exif create date) at less than n seconds interval between each other (consecutive files in time) I'm trying to preprocess files to generate a list for hugin panoramic generation.

date is in filename (yyyymmdd_hhMM--name) or in exif create date (exiftool). I can extract a primary list because I tagged my files with a keyword but it's not sufficient to get a good output by panostart. So in this list of filenames, I want to find a way to generate sublist of files which have a maximum time interval of n seconds between their datetime (typically 5-10 sec)

Any ideas how to do that ?

Thanks a lot. Cheers

Replies are listed 'Best First'.
Re: Group file per time interval
by roboticus (Chancellor) on Feb 08, 2013 at 00:57 UTC

    raiten:

    I'd convert the times to seconds, and then do something like this:

    $ cat t.pl #!/usr/bin/perl use strict; use warnings; use Data::Dumper; ### Get a list of picture times, in seconds my @times; push @times, int 150*rand for 0 .. 30; ### Sort them @times = sort { $a<=>$b } @times; ### Group together the pix whose time is less than MAX_TIME_DIFF secon +ds ### apart my $MAX_TIME_DIFF=10; # Minimum time between photos my $MIN_GRP_SIZE=3; # Minimum "interesting" group size my @groups; my $cur_group = [ shift @times ]; while (@times) { if ($$cur_group[-1]+$MAX_TIME_DIFF >= $times[0]) { # small interval, add to current group push @$cur_group, shift @times; } else { # store last group (if interesting) and start # a new one. push @groups, $cur_group if @$cur_group >= $MIN_GRP_SIZE; $cur_group = [ shift @times ]; } } print Dumper(\@groups); $ perl t.pl $VAR1 = [ [ 32, 36, 39, 39, 40, 42, 48, 53, 55, 56, 57, 58, 59 ], [ 72, 73, 77, 82 ], [ 93, 103, 104, 105, 108, 111, 113, 113, 114, 116 ] ];

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Group file per time interval
by Kenosis (Priest) on Feb 08, 2013 at 04:47 UTC

    This literally builds upon the excellent solution offered by roboticus. The next step is to get the EXIF date/time info from the image files, and convert that into seconds (the directory read is non-recursive):

    use strict; use warnings; use Image::ExifTool qw(:Public); use Date::Calc qw/Date_to_Time/; my $imageDir = './images'; my %timeFiles; ### Read each file's EXIF info and convert to seconds ### A Hash of Arrays (HoA) is used in case images have the same time for my $image (<$imageDir/*>) { # Skip image if no DateTimeOriginal info (highly unlikely) next unless my $dateTime = ImageInfo($image)->{DateTimeOriginal}; # Convert 'YYYY:MM:DD HH:MM:SS' to seconds my $time = Date_to_Time( split /[:\s]/, $dateTime ); push @{ $timeFiles{$time} }, $image; } ### <roboticus code> ### Sort the times from the hash my @times = sort { $a<=>$b } keys %timeFiles; ### Group together the pix whose time is less than MAX_TIME_DIFF secon +ds ### apart my $MAX_TIME_DIFF=10; # Minimum time between photos my $MIN_GRP_SIZE=3; # Minimum "interesting" group size my @groups; my $cur_group = [ shift @times ]; while (@times) { if ($$cur_group[-1]+$MAX_TIME_DIFF >= $times[0]) { # small interval, add to current group push @$cur_group, shift @times; } else { # store last group (if interesting) and start # a new one. push @groups, $cur_group if @$cur_group >= $MIN_GRP_SIZE; $cur_group = [ shift @times ]; } } ### </roboticus code> ### Get each group for my $group(@groups){ # The time in each group for my $time(@$group){ # The hash value is an array ref # $_ takes the value of each array element, viz., a file name print "$_: $time\n" for @{ $timeFiles{$time} }; } # Print blank lines between groups print "\n\n"; }

    Partial output:

    ./images/IMG_7139.CR2: 1336823075 ./images/IMG_7140.CR2: 1336823077 ./images/IMG_7141.CR2: 1336823079 ./images/IMG_7142.CR2: 1336823082 ./images/IMG_7143.CR2: 1336823084 ./images/IMG_7144.CR2: 1336823086 ./images/IMG_7145.CR2: 1336823088 ./images/IMG_7152.CR2: 1336823437 ./images/IMG_7153.CR2: 1336823442 ./images/IMG_7154.CR2: 1336823446 ./images/IMG_7155.CR2: 1336823451 ./images/IMG_7156.CR2: 1336823455 ./images/IMG_7157.CR2: 1336823467 ./images/IMG_7158.CR2: 1336823470 ./images/IMG_7159.CR2: 1336823473 ./images/IMG_7160.CR2: 1336823653 ./images/IMG_7161.CR2: 1336823657 ./images/IMG_7162.CR2: 1336823661 ./images/IMG_7163.CR2: 1336823664 ./images/IMG_7164.CR2: 1336823667

    I actually use Hugin (excellent pano app!) and this directory read shows names of images that Hugin eventually stitched into a wonderful pano.

      Kenosis:

      Nice! I was a bit surprised to see my little bit untouched though. I was expecting someone to have to expand the data structure into something like [ time, 'Filename' ] and tweak the sort and such. It hadn't occurred to me to stuff the filenames into a lookup table by time.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

      Hello,

      sorry for the long-delay answer. Pretty simple but powerful code and a lot of shortcuts I didn't know. I try to make it more advanced w exif analysis. It's still a work in progress.

      Any help appreciated :-)

      #!/usr/bin/perl -w ## $Id: group_panofiles.pl 1702 2013-02-28 18:34:47Z <ME> $ ## script to extract from a group of image list of images which cou +ld be pano ## ## https://groups.google.com/forum/?fromgroups=#!topic/hugin-ptx/hnPFG +y7EyQI ## http://perlmonks.org/?node_id=1017732 ## http://search.cpan.org/~exiftool/Image-ExifTool/ ## http://owl.phy.queensu.ca/~phil/exiftool/ExifTool.html ## need libimage-exiftool-perl libdate-calc-perl/libdatetime-format-st +rptime-perl ## FIXME! how to detect when taking 3 same images (eg: took 3 pictures + believing w bracketing but not): similarity coefficient > 95% ? ## http://www.perlmonks.org/?node_id=907337 ## http://www.simon-cozens.org/content/seeking-images-perl ## http://search.cpan.org/~xern/Image-Signature-0.01/Signature.pm ## ImageMagick compare: http://www.imagemagick.org/Usage/compare/#comp +are ## tag/rename dupe file before w 'findimagedupes' ## FIXME! many false positive if in rafale shooting (animal, road, ... +) ## test on 20130109, 20130209, 20130220 use strict; use warnings; use Image::ExifTool qw(:Public); use POSIX qw(strftime); use Getopt::Long; use Data::Dumper; ## Can either use one of those libraries (strptime seems a little fast +er) use Date::Calc qw/Date_to_Time/; #use DateTime::Format::Strptime; #my $analyzer = DateTime::Format::Strptime->new( pattern => '%Y:%m:%d +%H:%M:%S', on_error => 'croak' ) or die $DateTime::Format::Strptime: +:errmsg; use vars qw($opt_timediff $opt_groupsize $opt_tagsinclude $opt_tagsexc +lude $opt_extension $opt_autotagged $opt_recursive $opt_extended $opt +_ignorebracketing $opt_help $opt_verbose $opt_debug $opt_path); Getopt::Long::Configure("bundling"); GetOptions ("t=i" => \$opt_timediff, "timediff=i" => \$opt_timediff, "m=i" => \$opt_groupsize, "groupsize=i" => \$opt_groupsize, "p=s" => \$opt_path, "path=s" => \$opt_path, "I=s@" => \$opt_tagsinclude, "tagsinclude=s@" => \$opt_tagsinclud +e, "E=s@" => \$opt_tagsexclude, "tagsexclude=s@" => \$opt_tagsexclud +e, "r=s@" => \$opt_extension, "extension=s@" => \$opt_extension, "a=s@" => \$opt_autotagged, "autotagged=s@" => \$opt_autotagg +ed, "R" => \$opt_recursive, "recursive" => \$opt_recursive, "e" => \$opt_extended, "extended" => \$opt_extended, "i" => \$opt_ignorebracketing, "ignorebracketing" => \$opt_ig +norebracketing, "h" => \$opt_help, "help" => \$opt_help, "v" => \$opt_verbose, "verbose" => \$opt_verbose, "d" => \$opt_debug, "debug" => \$opt_debug, ); sub print_help() { print " This script allows you to parse a directory of pictures and try to gro +up files which could be used to form a panoramic picture (with Hugin for example). It is mainly using time difference between picture (normal m +ode) and exif metadata (extended check mode).\n\n"; print_usage(); } sub print_usage () { print "Required arguments: -p directory path to analyze Optional arguments: -t=n Maximum time difference allowed between pictures of a group [d +efault=10] -m=n Minimum group size [default=3] -R Analyze directory recursively [default=no] -e Extended check [default=no] -h Help -v verbose mode -d debug mode If extended check is enabled, -I=tag1,tag2 Include keyword tags: those picture are included (supe +rseded exclusion) -E=tag1,tag2 Exclude keyword tags: those picture are excluded -r=extension File type/extension to parse [default=jpg] -a=autotagged Files which are selected in panoramic group will be a +dded those tags -i Ignore bracketing images [default=select only pictures w exp +osure compensation 0>=n>-1]\n"; } ## FIXME! error 'Unable to read image data from file './20130209_11361 +1--DSC_0828.JPG': 'format 'jpeg' not supported - formats bmp, ico, pn +g, pnm, raw, sgi, tga available for reading - Can't locate Imager/Fil +e/JPEG.pm' at /usr/local/share/perl/5.14.2/Image/Compare.pm line 162. +' sub CompareImages0 { ## from http://www.perlmonks.org/?node_id=907337 ## http://search.cpan.org/~avif/Image-Compare-0.3/Compare.pm use Image::Compare; my $file1 = shift; #some jpegs, or png my $file2 = shift; my $filetype = lc($opt_extension); $filetype =~ s/jpg/jpeg/; my ( $cmp ) = Image::Compare->new(); $cmp->set_image1( img => $file1, type => $filetype ); $cmp->set_image2( img => $file2, type => $filetype ); $cmp->set_method( method => &Image::Compare::THRESHOLD, args => 25, ); #$cmp->set_method( # method => &Image::Compare::EXACT, # ); if ( $cmp->compare() ) { # The images are the same, within the threshold print "same\n"; return 1; } else { # The images differ beyond the threshold print "not same\n"; return 0; } } ## using ImageMagick compare sub CompareImages { my $file1 = shift; #some jpegs, or png my $file2 = shift; if (-f $file1 && -f $file2) { return `compare -metric AE -fuzz 10% $file1 $file2 null 2>&1`; } } my %timeFiles; my $count = 0; if (!$opt_timediff) { $opt_timediff=10; # Minimum time between photos } if (!$opt_groupsize) { $opt_groupsize=3; # Minimum "interesting" group size } if (!$opt_extended) { $opt_extended=0; } if (!$opt_extension) { $opt_extension='JPG'; } if (!$opt_ignorebracketing) { $opt_ignorebracketing=0; } if (!$opt_verbose) { $opt_verbose=0; } if (!$opt_debug) { $opt_debug=0; } if ( $opt_help || (!$opt_help && !$opt_path)) { print "FATAL! missing required arguments.\n"; print_help(); exit 1; } ## TEST #$opt_extended = 1; $opt_verbose = 1; $opt_ignorebracketing= 1; # exclude pictures w ExpComp not in 0<>-1 + to avoid conflict w bracketed images #$opt_path= '.'; ## TEST $opt_tagsinclude = ''; # include only pictures w those tags (supers +eded exclude) ## OK #$opt_tagsinclude = 'pano'; # include only pictures w those tags (s +uperseded exclude) ## NOK #$opt_tagsexclude = 'Cuisine'; # exclude pictures w those tags $opt_tagsexclude = 'plan-explications,Cuisine,ticket,Logement,flore'; + # exclude pictures w those tags $opt_autotagged = '_autotagged, _pano'; ## transform list into regexp, probably not good w very long list ... if ($opt_tagsinclude) { $opt_tagsinclude =~ s/([,\s*]+)/|/g; $opt_tagsinclude =~ s/^(.*)$/($1)/g; } if ($opt_tagsexclude) { $opt_tagsexclude =~ s/([,\s*]+)/|/g; $opt_tagsexclude =~ s/^(.*)$/($1)/g; } my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(t +ime); my ($start, $end); $start = time(); my $perflog = "Time: start at $hour:$min:$sec"; ### Read each file's EXIF info and convert to seconds ### A Hash of Arrays (HoA) is used in case images have the same time ## Note: in perl, exiftool relies on classical perl functions to parse + files/directories print "Building file list" if ($opt_verbose == 1); for my $image (<$opt_path/*$opt_extension>) { ## alternate file listing, faster? ## http://perlmeme.org/faqs/file_io/directory_listing.html ## http://stackoverflow.com/questions/5241692/perl-script-to-recurs +ively-list-all-filename-in-directroy #foreach my $image(glob("*$opt_extension)) { #File::Find->find({ \&process_file, no_chdir => 1 }, @dir); # Skip exiftool backup files next if ($image =~ m/_original$/i); # Skip image if no DateTimeOriginal info (highly unlikely) next unless my $dateTime = ImageInfo($image)->{DateTimeOriginal}; # Convert 'YYYY:MM:DD HH:MM:SS' to seconds my $time = Date_to_Time( split /[:\s]/, $dateTime ); ## another date conversion # my $dt = $analyzer->parse_datetime($dateTime); # my $time = $dt->epoch; if ($opt_extended == 0) { push @{ $timeFiles{$time} }, $image; } else { ## check $tags and IL constraint my ($cur_expcomp,$cur_keywords) = (0, ''); $cur_expcomp = ImageInfo($image)->{ExposureCompensation}; ## IL $cur_keywords = ImageInfo($image)->{Keywords}; if (!$cur_keywords) { $cur_keywords = ''; } ## how to check opt_tagsinclude/opt_tagsexclude in/out cur_keyword +s if ($cur_expcomp<=0 && $cur_expcomp>-1) { # print "DEBUG: current $cur_expcomp, $cur_keywords vs $opt_tag +sinclude / $opt_tagsexclude.\n" if ($opt_debug==1); if ($opt_tagsinclude) { if ($cur_keywords =~ m/$opt_tagsinclude/i) { print "DEBUG: include picture '$image' cf opt_tagsincl +ude '$opt_tagsinclude'\n" if ($opt_debug==1); push @{ $timeFiles{$time} }, $image; } else { print "DEBUG: exclude picture '$image' cf opt_tagsincl +ude\n" if ($opt_debug==1); } } elsif ($opt_tagsexclude) { if (!$cur_keywords || ($cur_keywords && $cur_keywords !~ m +/$opt_tagsexclude/i)) { print "DEBUG: include picture '$image' cf opt_tagsexcl +ude '$opt_tagsexclude'\n" if ($opt_debug==1); push @{ $timeFiles{$time} }, $image; } } else { print "DEBUG: include picture '$image' cf expcomp\n" if ($ +opt_debug==1); push @{ $timeFiles{$time} }, $image; } } else { print "DEBUG: excluding picture '$image' cf '$cur_expcomp', '$ +cur_keywords'\n" if ($opt_debug==1); } } print "." if ($opt_verbose == 1); $count++; } print " End.\n" if ($opt_verbose == 1); $perflog .= ", list files+".(time()-$start); ### <roboticus code> ### Sort the times from the hash my @times = sort { $a<=>$b } keys %timeFiles; print Dumper(\@times) if ($opt_debug==1); ### Group together the pix whose time is less than opt_timediff second +s ### apart my @groups; ## FIXME! losing first file (test 20130209 w or w/o extended check) => + why this shift ??? disabled #my $cur_group = [ shift @times ]; my $cur_group = [ @times ]; my (@cur_ref, @cur_file); sub set_cur_exif { my $image = shift; if ($image && -f $image) { my @cur = [ $image, ImageInfo($image)->{ExposureCompensation}, + ## IL ImageInfo($image)->{ExposureProgram}, ImageInfo($image)->{FocusMode}, ImageInfo($image)->{Lens}, ImageInfo($image)->{FocalLength}, ImageInfo($image)->{ISO}, ImageInfo($image)->{Orientation}, ImageInfo($image)->{Quality} ]; $cur[0][5] =~ s/ mm//; return @cur; } else { print "DEBUG: Warning! no image filename '$image'.\n"; } } sub log_exif { my @cur = shift; my $opt_debug = shift; my $txt = shift; my $extra_txt = shift; if (@cur) { print "DEBUG: setting $txt exif settings w '$cur[0][0]'.\n" i +f ($opt_debug==1); print " (".$cur_ref[0][0].', '.$cur_ref[0][1].', '.$cur_ref[0] +[2].', '. $cur_ref[0][3].', '.$cur_ref[0][4].', '.$cur_ref[0][5].', +'. $cur_ref[0][6].', '.$cur_ref[0][7].', '.$cur_ref[0][8].")$ +extra_txt.\n" if ($opt_debug==1); #print Dumper(\@cur_ref) if ($opt_debug==1); } else { print "DEBUG: Warning! no exif data.\n"; } } print "Group per time" if ($opt_verbose == 1); while (@times) { print "---> $times[0]\n" if ($opt_debug==1); if ($$cur_group[-1]+$opt_timediff >= $times[0]) { print "--> $times[0]\n" if ($opt_debug==1); if ($opt_extended == 0) { # small interval, add to current group, no extended check push @$cur_group, shift @times; } else { my $image = ${ $timeFiles{$times[0]} }[0]; print "-> $times[0], $image\n" if ($opt_debug==1); print "DEBUG: '$image' image file\n" if ($opt_debug==1); if (!@cur_ref) { @cur_ref = set_cur_exif($image); push @$cur_group, shift @times; log_exif(@cur_ref, $opt_debug, 'cur_ref', ''); } @cur_file = set_cur_exif($image); #log_exif(@cur_file, $opt_debug, 'cur_file', " & distance=".ab +s(1-$cur_ref[0][5]/$cur_file[0][5].' & similar='.CompareImages($cur_r +ef[0][0], $cur_file[0][0]))); log_exif(@cur_file, $opt_debug, 'cur_file', " & distance=".abs +(1-$cur_ref[0][5]/$cur_file[0][5])); if ( $cur_ref[0][0] eq $cur_file[0][0] ) { print "DEBUG: same ref & current_file. skipping.\n" if ($o +pt_debug==1); ## Note: we don't add to current group but no shifting els +e loose valid group. ##shift @times; } elsif ( ## meta should be same ## Note: NOK comparing full @ NOK => switch to separate # @cur_ref == @cur_file ## Note: ?OK comparing each value separately NOK (ex: 4, 6 +) ## FIXME! still some matching and things not equal. ex: 20 +130109_083736--DSC_0908.JPG/ExpProgram or FocalLength distance or Ori +entation $cur_ref[0][1] == $cur_file[0][1] && ## Exposure Compen +sation $cur_ref[0][2] eq $cur_file[0][2] && ## Exposure Progra +m $cur_ref[0][3] eq $cur_file[0][3] && ## FocusMode $cur_ref[0][4] eq $cur_file[0][4] && ## Lens abs(1-$cur_ref[0][5]/$cur_file[0][5]) < 0.11 && # same +focal length at 10% margin $cur_ref[0][7] eq $cur_file[0][7] && ## Orientation $cur_ref[0][8] eq $cur_file[0][8] ## Quality ## meta could be different: aperture, shutterspeed, ISO (auto +mode) ) { push @$cur_group, shift @times; print "DEBUG: image '$image' matchs same exif settings tha +n '$cur_ref[0][0]'.\n" if ($opt_debug==1); } else { print "DEBUG: image '$image' does not match same exif sett +ings than '$cur_ref[0][0]'.\n" if ($opt_debug==1); ## new group print "DEBUG: -> new group w '$image'.\n" if ($opt_debug== +1); push @groups, $cur_group if @$cur_group >= $opt_groups +ize; $cur_group = [ shift @times ]; if ($opt_extended == 1) { undef @cur_ref; } @cur_ref = set_cur_exif($image); } undef @cur_file; } } else { # store last group (if interesting) and start # a new one. print "DEBUG: -> save group, start a new one as time difference (" +.($$cur_group[-1]+$opt_timediff)." < $times[0]).\n" if ($opt_debug==1 +); push @groups, $cur_group if @$cur_group >= $opt_groupsize; $cur_group = [ shift @times ]; if ($opt_extended == 1) { undef @cur_ref; } } print "." if ($opt_verbose == 1); } print " End.\n" if ($opt_verbose == 1); $perflog .= ", grouping files+".(time()-$start); ### </roboticus code> ### Get each group my $gcount = 0; my ($filefirst,$filelast,$groupname); for my $group(@groups){ # print Dumper(\$group); $filefirst = ${ $timeFiles{ @{$group}[0] } }[0]; $filefirst =~ s/.$opt_extension$//i; $filefirst =~ s/^.*\///i; $filelast = ${ $timeFiles{ @{$group}[ (@$group-1) ] } }[0]; $filelast =~ s/.$opt_extension$//i; $filelast =~ s/^.*\///i; $groupname = "$filefirst--$filelast-pano.panolist"; print "FILENAME: $groupname, ".(@{$group})."\n" if ($opt_verbose== +1); open (FOUTPUT, ">$groupname") or die "ERROR: Can't open output fil +e '$!'\n"; # The time in each group for my $time(@$group){ # The hash value is an array ref # $_ takes the value of each array element, viz., a file name for my $image (@{ $timeFiles{$time} }) { my $exifdata = "Exif: ".ImageInfo($image)->{ExposureCompensati +on}.', '. ImageInfo($image)->{ExposureProgram}.', '. ImageInfo($image)->{FocusMode}.', '. (ImageInfo($image)->{Lens}?ImageInfo($image)->{Lens}:' +undef').', '. ## not always initialized? ImageInfo($image)->{FocalLength}.', '. ImageInfo($image)->{ISO}.', '. ImageInfo($image)->{Orientation}.', '. ImageInfo($image)->{Quality}; if (ImageInfo($image)->{Keywords}) { $exifdata .= ', '.ImageInfo($image)->{Keywords}; } print "$image: $time. $exifdata\n" if ($opt_verbose==1); ## FIXME/TEST! add autotagged pano to files consider as panora +mic source ## add a keyword without replacing existing keywords in the fi +le ## http://www.sno.phy.queensu.ca/~phil/exiftool/ExifTool.html# +SetNewValue #($success, $errStr) = $exifTool->SetNewValue(Keywords => $opt +_autotagged, AddValue => 1); } print FOUTPUT "$_\n" for @{ $timeFiles{$time} }; } # Print blank lines between groups print "\n\n" if ($opt_verbose==1); close(FOUTPUT); $gcount++; } $end = time(); $perflog .= ", output+".(time()-$start); print "Image count: $count, Group count: $gcount.\n" if ($opt_verbose= +=1); print "$perflog.\n" if ($opt_verbose==1);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1017732]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-19 10:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found