Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Find the last item in a series of files

by tybalt89 (Monsignor)
on Jun 16, 2017 at 15:27 UTC ( [id://1192951]=note: print w/replies, xml ) Need Help??


in reply to Find the last item in a series of files

#!/usr/bin/perl # http://perlmonks.org/?node_id=1192943 use strict; use warnings; my %names; /(.*)\.(.*)/ and $names{$1}[$2] = $_ while <DATA>; print $names{$_}[-1] for sort keys %names; __DATA__ file.001 file.003 file.002 one.004 two.001 two.003 one.002 one.001 two.002

Replies are listed 'Best First'.
Re^2: Find the last item in a series of files
by CountZero (Bishop) on Jun 16, 2017 at 16:30 UTC
    Very nice solution!

    A small improvement makes the regex a bit more specific and have it reject filenames that do not match the expected file name template.

    use strict; use warnings; my %names; /^([^.]+)\.(\d{3})$/ and $names{$1}[$2] = $_ while <DATA>; print $names{$_}[-1] for sort keys %names; __DATA__ file.001 file.003 not.good.001 file.002 file.10 one.004 two.001 two.003 two.five one.002 one.001 one.0039 two.002 .005

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re^2: Find the last item in a series of files
by fredho (Novice) on Jun 16, 2017 at 16:48 UTC
    Here is my piece of code. I'm not sure this is the best way to do
    chdir ($folder); while ($file = <*>){ my ($ext) = $file =~ /(\.[^.]+)$/; #Check file extension if (($ext =~ m/00./) and ($ext ne ".001")){ next; } elsif ($ext eq ".001") { # first file is required my $filenameroot = $file; $filenameroot =~ s/(.+)\.[^.]+$/$1/; # File name root my @list = glob("$folder$filenameroot*"); print "Last element : $list[(scalar @list-1)]\n"; # Last file + of the series } }
      I see your thought process, this is very close. Since you were nice enough to comply with the "hey, show us what you got" request, I'll make a few comments which I hope will be helpful for you in writing future code...

      • if (($ext =~ m/00./) and ($ext ne ".001")){ The first conditional as part of the "and" is not needed.
        ($ext ne ".001") says it all.
      • elsif ($ext eq ".001") { This "if" is not needed either. $ext has to be equal to ".001" if you get to this point. The previous lines have rejected any value that wasn't equal to "001".
      • Adding comment about:
        my $filenameroot = $file; $filenameroot =~ s/(.+)\.[^.]+$/$1/; # File name root

        This is fine, you make a copy of "$file" by assigning that to a new variable, "$filenameroot" Then you use a substitute operation to modify $filenameroot. This works. However consider:
        (my $filenameroot) = $file =~ m/(.+)\.[^.]+$/;
        In general a substitute operation is more "expensive" than a simple "return a value" operation. That is because the input string must be modified instead of selected parts just being copied. If you put the LHS (Left hand side) into a List context, you can assign $1, and even $2,$3.. from a match. Here $1 gets assigned to $filenameroot - no substitution operation required. This of course also avoids the problem of assigning $filenameroot to something that it is "not quite correct" yet. Here $filenameroot becomes $1.
      • my @list = glob("$folder$filenameroot*"); I am not sure if glob() returns a sorted list or not? Even if it does, it would be Character String sorted and not numerically sorted. This can make a big difference as "13" sorts lower than "3". This sorting difference between Character and Numeric is something to consider when you have numeric values. I don't know for sure whether this is a problem, but always include some double digit numbers in your test cases.
      • The big issue with the glob() is that you are re-reading the directory multiple times. File system operations are "expensive" in terms of CPU. Get in the habit of trying to do a directory read "only once". Store it if you have to in your own data structure. Of course in your application, I don't expect any performance issue, but this is something to be aware of in the future.
      • print "Last element : $list[(scalar @list-1)]\n"; That does indeed get the last element of @list. However there could be a problem because that last element might not be the file with the largest extension number due to previously mentioned potential sorting issues? Note better written as $list[-1]. In Perl the -1 index is the last item, -2 is next to last, etc. A very handy concept. Your code is correct, just mentioning that there is a better syntax for this.
      • I direct your attention to the code by BillKSmith, tybalt89 and CountZero. This is clever in how it works. I think some further explanation may be helpful to you.

        This builds a HoA (Hash of Array) called %names. What is special is that the array @{$names{"name"}} is what is called a "sparse array" - not every element of the array has an assigned value. Perl allows this. If say @array only has 3 things in it, you can still assign $array[14]="Something";. A bunch of values will wind up being "undef" or undefined, but that is just fine. A numeric sort to get the "largest suffix number" is unnecessary, just using the [-1] index is enough. The sort of keys %names just puts the root names in alphabetical order. This has nothing to do with determining the highest numbered suffix. Added: look at Laurent_R's code also.

        I recommend that you use some adaption of the HoA code or Laurent_R's code. Both look great to me.

        Welcome to the group! You will get a lot of help here. In general more help is forthcoming when you demonstrate some effort on your part (which you did).

Re^2: Find the last item in a series of files
by jamroll (Beadle) on Jun 20, 2017 at 17:13 UTC
    okay. i haven't tested this code...but, here's what i got...
    # firstly, i'm gonna use the working directory, for laziness' sake! lo +l # secondly, i haven't thoroughtly tested this. 'sub external_files($$ +)' is tested, and does work according to my tests # i'm working in a windows 10 environment, apache24 and activestate's +perl 5.020002 (i think that version # is right) # # thridly, this script assumes all the files in the folder are named w +ith .xxx where each x is a digit 0..9 # fourth, this will do no error checking! it will work perfect, so lon +g as you adhere to the file extension convention # fifth, and finally, i have not tested this code ############################## # i copied this from a project i'm working on # yes. i use prototypes. SUE me! sub external_files($;$) { #* # lists files within a specified folder (eg: config, txt) # folders will not be included in this list - just the filenames onl +y # if no type is provided, *.* is assumed # type should be just "png" or "txt", no need to include a leading d +ot #* my ($folder, $type) = @_; # a location (eg: users), relative to web +root && a file type if ($type) { # the following is just in case the user of this # subroutine ignores instructions (mainly me lol) $type =~ s/(\*)*//g; # remove stars $type =~ s/(\.)*//; # remove dots $type =~ s/\///g; # remove forward slashes if ($type) { $type = ".$type"; } } if ($folder) { # same idea here as for $type # this one, however, may seem weird, but i've # found it better to account for all possibilities # rather than leave it up to the user of this # code to ensure correct params are given # # besides, i tend to forget to follow my own # instructions, so this saves me tons of head # scratching, see? $folder =~ s/(\/)*$//; # remove trailing /'s $folder =~ s/^(\/)*//; # remove leading /'s $folder =~ s/\/\//\//g; # convert //'s to / $folder .= "/"; # attach trailing /* } my @fixed; my $filespec = $folder . "*" . $type; my @dirs = glob($filespec); $folder =~ s/\./\\./g; $folder =~ s/\//\\\//g; foreach my $dir (@dirs) { if (-f $dir) { $dir =~ s/$folder//; push (@fixed, $dir); } } return @fixed; # an array #usage: my @fileList = external_files("D:/", "txt"); } # end of sub external_files($$); #sub get_last($) { # you could uncomment this line...and turn the foll +owing into a sub! #my ($folder) = @_; # and yes, i do this, too! again, sue me (i belie +ve wholeheartedly, and pedantically so, in the K.I.S.S concept) # my @files = external_files($folder); # i'll leave it up to you to ma +ke sure $folder is a valid location, but give it whatever you like, r +eally my @files = external_files("d:/myNumberedFiles"); # @ files should now contain all yer files stored in d:/myNumberedFile +s/ # now, you want the file with an extension that works out to being the + highest #? # easy! # first, i'm gonna rip through the list, and build a new one. # the new one will contain just the extension with no dots. # leading zeros will be removed from the extension. this should # result in a list with elements that are just numbers. # then, i'm gonna sort the bugger, and pit out the last element. my @exts = (); foreach my $file (@files) { $file =~ s/^(.)*\.(0)*//; # remove everything before and including t +he dot and any leading zeros after the dot # now, pop that into your list push @exts, $file; } # now sort the list! sort @exts; print $exts[$#exts]; #return $exts[$#exts]; #} # and you have yer answer... #you could drop the above "main" code into a sub of it's own, too, of +course. #just uncomment the #sub... line and the line after it, and the #retur +n and #} lines at the bottom

    i hope this one works, and doesn't get too butchered by the rest of the monks here :D i like to think i'm pretty decent at this coding thing, so, go easy on me. i'm 100% self taught, and i have no personal group of PERL programmers in my midst - i'm alone, and i'm a one man band.

    sincerely,

    jamroll
      i haven't tested this code...

      Having a variety of test cases is important. I admit I haven't tested your code myself, but if you had tested it with multiple cases, you might have found that, for example, sort @exts; isn't doing what you want. Also, I can warmly recommend one of the filename manipulation modules like Path::Class, or perhaps File::Spec (a core module) - if you use the former you can even use its methods to list files in the directory (->children). A few more suggestions: Be careful with if ($folder), since that will test negative when $folder happens to be "0" (Truth and Falsehood), you probably want to use length or defined tests instead (same goes for if ($type), of course). Also, I think you might have missed a /g on your "remove dots" regex?

      Update 2019-08-17: Updated the link to "Truth and Falsehood".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1192951]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-12-07 23:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which IDE have you been most impressed by?













    Results (50 votes). Check out past polls.