Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: Find images regardless of filetype extension.

by zzspectrez (Hermit)
on Aug 04, 2005 at 04:21 UTC ( [id://480692]=note: print w/replies, xml ) Need Help??


in reply to Re: Find images regardless of filetype extension.
in thread Find images regardless of filetype extension.

may have been slightly better. Also either a pre-mapping to "$dir/$_" or a chdir into $dir may have been advantageous. Incidentally, as far as the former option is concerned, you know that to do it in a really portable way you should have used File::Spec, don't you? (This is just FYI, I use stuff like "$dir/$_" myself all the time.)

Actually, no I know there are modules/methods that should be used to clean up paths, but it isn't something I have really done as of yet. Exactly what should I be doing?? If you dont mind explaining the whats and whys. Should I be using the canonpath or catfile methods? The manpage says that canonpath does a logical cleanup.. What exactly is that?


All in all, and maintaining an approach similar to yours, I may have done:

my (@files,@dirs);
for (readdir DIR) {
    next if $_ eq '.' or $_ eq '..';
    $_="$dir/$_";
    push @files, $_ if -f;
    push @dirs, $_ if -d _;
}

Thanks for your variation. The string comparison would be a better way to go in this situation. Regexes are probably my weakest point with perl, so this is something I have been overusing trying to get used to. The performace hit is good mention. thanks.


Now most file and directory opening functions support lexical handles. Well, that's not strictly true (from the technical point of view), but that's a good approximation to the truth. And with a lexical dirhanle you wouldn't need an explicit close.

I am almost possitive I have seen mention here or in one of the perl books that you should close filehandles and check for error code ( oops ) just as you do with calls to open... And that it is common mistake that people only check for success on open and not the close.


Well, I can't comment on your whole logic. But rather than having $recurse flag I would try to arrange things in other to have at some point a recursion for @dirs (this hypothetical @dirs not being yours), so that when @dirs is empty simply no recursion happens...

The point of the flag is if you do not want recursion at all, the main $recursion flag can be set to 0 and then it will only return images in the top directory.


        if (@$files) {    
            push @files, @$files;
        }    

Why not

        push @files, @$files;
instead? (I don't know which is faster, but I suspect they execute at much the same speed, and the latter is more clear, IMHO.)

Actually I added that test trying to search out a bug and left it in. The error was that I made the mistake of doing an explicit return undef; instead of just return; when being called in a list context. So I was returning a list of one value "undef" and then later trying to dereference the undef value... ooops.


    foreach $file (@files) {
        open FH, $file or
            die "Error opening $file: $!\n";
Ditto as above wrt using lexical filehandles. Also, some disagree, but I always recommend using the three args form of open.

I obviously do not understand your reference to lexical filehandles. Are you sugesting I dont check for error? HuH?? I must be having a brain fart because I am not following you.

I have heard mention that calling the three arg method is better, but have not heard good reasoning for this.


        if ( $data =~ /^BM/ ) {
            $type = 'BMP';
        }elsif ( $data =~ /^GIF879a/) {
            $type = 'GIF';
        }elsif ( $data =~ /^\xFF\xD8/ ) {
            $type = 'JPG';
        }else {
            $type = undef;
        }

Ouch! It doesn't mach my PNG images, not to say a few other tenths of popular formats, not to mention more exotic ones... Seriously this may be a good reason for using a dedicated module...

Here....

  • elsif ( data =~ /^\x89PNG\x0d\x0a\x1a\x0a/) { type = 'PNG'; }
Now it checks your pngs... Happy? :) Actually, this wasnt written just for the goal of seeing how easy it would be to identify a couple common formats. Also, if you are looking just for jpegs and gifs why search for pngs? I was not trying to identify every obscure image, even some of the more popular formats in this case.


No need for scalar. No need for the outer parentheses. No need for the inner ones either. No need for the if modifier, and no need for the second statement:
return @images;
is just as fine!

That is definetly cleaner. Perls flexibility of giving different values depending on how it is called can sometimes be confusing. So I have developed a defense mechanism that whenever any errors are encountered or I am unsure on precedence order then add paranthesis or force return mode scalar. Although not necessary, I think if not tooo overboard it isnt bad habit. What do you think given that reasoning?


Here you're returning a possibly huge list of files to scan them subsequently for images. I would check them on the fly, that is what I would do in the File::Find 'wanted' subroutine.

Ultimately, this is what should be done. However, it was a fun adventure and I learned a few new things... Namely the use of /z instead of $ in regexes when you dont want match before the newline. Not to do an explicit return undef; because you may get unexpected results when called in list context. As well as the many thing you have pointed out..

Thanks.....

zzSPECTREz

Replies are listed 'Best First'.
Re^3: Find images regardless of filetype extension.
by blazar (Canon) on Aug 04, 2005 at 11:02 UTC
    Actually, no I know there are modules/methods that should be used to clean up paths, but it isn't something I have really done as of yet. Exactly what should I be doing?? If you dont mind explaining the whats and whys. Should I be using the canonpath or catfile methods? The manpage says that canonpath does a logical cleanup.. What exactly is that?
    I meant, to be absoultely sure about portability, instead of "$dir/$_" one {c,sh}ould use File::Spec's catfile method.
    Thanks for your variation. The string comparison would be a better way to go in this situation. Regexes are probably my weakest point with perl, so this is something I have been overusing trying to get used to. The performace hit is good mention. thanks.
    Regexen rock! However people often tends to overuse them. Including me, sometimes...
    I am almost possitive I have seen mention here or in one of the perl books that you should close filehandles and check for error code ( oops ) just as you do with calls to open... And that it is common mistake that people only check for success on open and not the close.
    Well, indeed some people recommend to check the return value of close calls too. To me, that's a bit of an exaggeration. As far as regular files are concerned, that is. (More info about this topic below.)
    Well, I can't comment on your whole logic. But rather than having $recurse flag I would try to arrange things in other to have at some point a recursion for @dirs (this hypothetical @dirs not being yours), so that when @dirs is empty simply no recursion happens...
    The point of the flag is if you do not want recursion at all, the main $recursion flag can be set to 0 and then it will only return images in the top directory.
    So far, so fine. I hadn't thought of that. As I said I've not practiced much the sport of reimplementing File::Find. If it were me I would just use the latter if I wanted recursion (and I could also control the depth of the search if needed) and perhaps a simple glob if I didn't.

    Now, glob is for some reason I ignore an oft underused and underestimated function. I frequently find myself advocating its use when I see people explicitly using opendir, readdir and grepping on filenames whereas it would take care of doing all of this for them. Granted: behind the curtain it does use a module, but I hope that doesn't overwhelmingly bother you:

    $ perl -MO=Deparse -e '<*>' use File::Glob (); glob('*'); -e syntax OK
    foreach $file (@files) { open FH, $file or die "Error opening $file: $!\n";
    Ditto as above wrt using lexical filehandles. Also, some disagree, but I always recommend using the three args form of open.
    I obviously do not understand your reference to lexical filehandles. Are you sugesting I dont check for error? HuH?? I must be having a brain fart because I am not following you.
    I meant something like this:
    for my $file (@files) { open my $fh, '<', $file or die "Error opening `$file': $!\n"; # do something with $fh # ... # no need for an explit close() }
    I have heard mention that calling the three arg method is better, but have not heard good reasoning for this.
    Well, for one thing it clearly stresses at a glance what the file is being opened for. And as such is IMHO more elegant and terse. Also, consider this oversimplified example:
    $ cat foo.pl #!/usr/bin/perl use strict; use warnings; my $file=shift; open my $fh, $file or die $!; print "The contents of `$file' are:\n", <$fh>; __END__ $ ./foo.pl aaa The contents of `aaa' are: asdfdaf sfdfdd sffgsdd $ ./foo.pl '|echo "Gotcha!">foo.pl' The contents of `|echo "Gotcha!">foo.pl' are: $ cat foo.pl Gotcha!
    This is a security hole that has actually been exploited e.g. in poorly written CGI scripts. Not to say that people could not write secure programs with the two args form of open nor that the three args one is bullet proof. Indeed in any situation where it may matter, one must validate his imput to protect from malicious users, but the latter provided a first good protection for a small expense.
    Ouch! It doesn't mach my PNG images, not to say a few other tenths of popular formats, not to mention more exotic ones... Seriously this may be a good reason for using a dedicated module...
    Here....
    elsif ( data =~ /^\x89PNG\x0d\x0a\x1a\x0a/) { type = 'PNG'; }
    Now it checks your pngs... Happy? :)
    It doesn't check my TIFF and PNM images!! :-)
    That is definetly cleaner. Perls flexibility of giving different values depending on how it is called can sometimes be confusing. So I have developed a defense mechanism that whenever any errors are encountered or I am unsure on precedence order then add paranthesis or force return mode scalar. Although not necessary, I think if not tooo overboard it isnt bad habit. What do you think given that reasoning?
    I think that a concise code is generally more clear to read and understand. Well written perl code tends to be concise. Of course I'm not talking about extreme conciseness like the one people tries to achieve e.g. when golfing, as that brings into obfuscation instead. I'm talking about the Right(TM) amount conciseness...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://480692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-19 05:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found