http://www.perlmonks.org?node_id=262996

Hi everyone, Recently, I've been playing around with an opendir/readdir wrapper. (Much thanks Abigail-II and VSarkiss.) My old way of running through files in a directory was:
#!/usr/sbin/perl -w use strict; #... while ($file = <*.dem>) { open(FILE, "$file") || warn "Warning: can't open $file, skippi +ng.\n"; ($outfile, $right) = split(/\./, $file, 2); open(FILEOUT, ">$outfile.xyz") || die "\nERROR: Can't open $ou +tfile, exiting.\n"; while (<FILE>) { chomp; #...
Using a wrapper, it now looks like this:
#!/usr/sbin/perl -w use strict; use Cwd; my $the_directory = cwd; #... chdir $the_directory or die "Couldn't chdir to $the_directory; $!"; opendir(D, ".") or die "Couldnt open . ($the_directory): $!"; while ($file = readdir D) { next unless $file =~ /\.las$/i; open(FILE, "$file") || warn "Warning: can't open $file, skippi +ng.\n"; while (<FILE>) { chomp; #...
Since I was working with around 4000+ files, the change in the amount of time it took for my scripts to run was impressive to say the least! So then I started thinking that this wrapper was a trick that most, if not every, perl programmer would probably find handy someday. Then I started wondering if there were any other little bits of code that are used often, or bits of code that should be used but aren't widely known. What are some tricks that every perl programmer should have in his/her bag? Thanks!

licking9Volts

Update: I think I was a little too vague in my intentions here. :) I wasn't looking for general tips for programming in perl. I was looking for tricks and shortcuts many of you use to improve upon more commonly used coding solutions. In my above example, I meant to show how even though my original solution worked fine, by using the opendir/readdir wrapper it was improved upon greatly. Sorry for the mix-up. :))

2003-06-03 edit ybiC: balanced <readmore> tags around <code> such that meditative text viewable from The Monastery Gates and Meditations.

Replies are listed 'Best First'.
Re: Things every perl programmer should know?
by Mr. Muskrat (Canon) on Jun 04, 2003 at 14:25 UTC

    Every perl programmer should understand the conditional operator, or the ternary "?:". It works much like an if-then-else statement. I'll give two examples of usage.

    if ($condition) { $foo = $bar; } else { $foo = $baz; }
    becomes:
    $foo = $condition ? $bar : $baz;

    if ($condition) { $baz = $foo; } else { $bar = $foo; }
    becomes:
    ($condition ? $bar : $baz) = $foo;

    Be careful with the second one, both $bar and $baz must be valid lvalues! That means that you must be able to assign values to them.

    Updated to meet the expectations of the updated OP.

Re: Things every perl programmer should know?
by Aragorn (Curate) on Jun 04, 2003 at 14:53 UTC

      You ''forgot'' Effective Perl Programming by Joseph N. Hall and Randal L. Schwartz. :) The more I read this book, the less I am plagued by inefficiencies and my inability to express myself idomatically in Perl. It's like programming experience in a tin can.

      Update: Missed kodo's post at the bottom.

      --
      Allolex

Re: Things every perl programmer should know?
by hardburn (Abbot) on Jun 04, 2003 at 14:34 UTC

    My CGIs have a set_params() subroutine which returns a hash of all the paramters I want (I used to set global variables, but I kicked myself of that habit). Orginally, I did this one param at a time:

    sub set_params { use CGI qw(:standard); my %params; $params{param1} = param('param1') || ''; $params{param2} = param('param2') || ''; $params{param3} = param('param3') || ''; # Dozen more lines of that return %params; }

    Now I keep a constant global array named @FIELDS which contains a list of all the fields I want and put them together with map:

    my @FIELDS = qw(param1 param2 param3 . . . ); sub get_params { use CGI qw(:standard); return map { $_ => param("$_") || '' } @FIELDS; }

    You can remove the return in the above, but I kept it in this example for clarity.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

Re: Things every perl programmer should know?
by neilwatson (Priest) on Jun 04, 2003 at 16:17 UTC
    Although it seems obvious to most perlmonks,
    use strict; use warnings;
    are often missing from programs that I see. Perl newbies need to know that using this modules will help create clean portable code faster than without them.

    In the bigger picture I think perl users (indeed all programmers) need to better educate themselves on network and system security. Learn how the code you write can affect others. The server is not a island. What you do cause others to have a very bad day.

    Neil Watson
    watson-wilson.ca

      use strict; use warnings;
      are often missing from programs that I see.

      Careful you can get shot down in flames for that sort of talk.

      Understanding why warnings and strict are good, and why they shouldn't be necessary in deployed code is something all new Perl programmers should learn. Unfortunately not all trainers and/or Perl gurus agree with that statement.

      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Barbie
      Birmingham Perl Mongers
      Web Site: http://birmingham.pm.org/
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        Actually ... chanting:
        use strict; use warnings;
        in this community earns elevation via XP. But you do raise very good points, and seeing that you have been programming longer than myself, i realize that you are speaking straight from the "real world" gut. However, i would change "something all new Perl programmers should learn" to "something all new Perl programmers will eventually have to deal with". (as i didn't with one nameless company - instead i left.)

        The difference is that i would rather see advice to use strict and warnings than not. The fact that this doesn't happen as often as it should in the real world (let's face it - software design is hard!) doesn't mean that a new Perl programmer shouldn't excercise good practices now before they become ... tainted. Teach 'em the rules now, let 'em break the rules after the learn them. And Perl is all about breaking rules. ;)

        Besides, not all companies suffer from the "don't look back" syndrom that seems to accompany leaving strict and warnings out of production code ... i happened to work for two that used strict and warnings in production, as well as CVS religiously and Use Cases regularly. Maybe this wasn't necessary, but we didn't waste time tracking down and fixing obscure bugs either. ;)

        UPDATE:
        I should mention that the one place i don't use strict and warnings is when i write one-liners. My logic is that one-liners are true throw away scripts, and if you need strict and warnings, then it probably shouldn't be a one-liner.
        perl -MCGI=foo -le"print foo{bar=>baz}=>qux"

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
        You make a good point. However, when I see code without a single my statement I know strict and warnings have not been used.

        Neil Watson
        watson-wilson.ca

        Removing strict/warnings from deployed code ... I'm of two minds on this one. On the one hand, if it's deployed, you should be able to trust it. (I am, of course, speaking mostly tongue-in-cheek.)

        But, warnings and strict, coupled with good logging, will catch a ton of run-time errors, like undefined objects being used and the like.

        (Though, good programming practices and code reviews will catch 99% of those, anyways. I can't believe the amount of code that's just slapped together, especially in Perl. At my current position, I'm having to rewrite my customers' code cause my systems can't depend on their dreck.)

        ------
        We are the carpenters and bricklayers of the Information Age.

        Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      I'd use
      use strict; use warnings; no warnings 'uninitialized';
      myself. I seriously dislike having to write code like
      if (defined $foo and $foo) { ...

      Update on Jun 05, 2003: I do not remember the last time I got a "Variable is used once" warning. Maybe I was just lucky :-)

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature

        For me, it's:
        use strict; use warnings; no warnings 'once';
        because that sodding warning has never caught anything for me that strict wouldn't have trapped sooner. The kicker is that fixing it requires jumping through very silly hoops to make it go away. ($foo = $foo = 1; anyone?) As for uninitialized - well, I don't get a warning on
        if ($foo) { ... }
        I admit it's sometimes a little awkward to code around the uninitialized warning, but usually something like adding a || 0 or || '' or such does the trick just fine. And that kind of thing isn't silly, in fact I think it's clearer in intent in that I document via code that I explicitly expect undefs there.

        Makeshifts last the longest.

Re: Things every perl programmer should know?
by Juerd (Abbot) on Jun 04, 2003 at 15:43 UTC

    Every Perl programmer should know how to reach

    Those two sites have links to all other relevant information. I advise you to find those links and follow them. Have a look at the Tutorials section if you haven't already done so. If you have done so, then why did you post? :)

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Every Perl programmer should know how to reach

      s/reach/teach/; for my suggestion :).

Re: Things every perl programmer should know?
by bobdeath (Scribe) on Jun 04, 2003 at 14:47 UTC
    I think every perl programmer should invest some time in design before coding is started. Too often I see people come up with a really good idea, go rush off to code it, and then realize that half the code they have written needs to be redone, or worse, isn't needed in any way. Perl allows you to do many things quickly, but don't let that lead you into just jumping straight into coding.
Re: Things every perl programmer should know?
by naChoZ (Curate) on Jun 04, 2003 at 17:18 UTC
    How to install modules with the CPAN prompt:

    # perl -MCPAN -e shell cpan> install Module::Name

    ~~
    naChoZ

      How to install modules when you don't have access to CPAN
      #cp /mnt/cdrom/Finance-Loan-0.03.tar.gz . #tar -xvzf Finance-Loan-0.03.tar.gz #cd Finance-Loan-0.03 #perl Makefile.PL #make #make test #sudo make install


      Update:D'oh! Added perl Makefile.PL per naChoZ suggestion. Must... have... more ... coffee.... :)

      ----
      Zak
      Pluralitas non est ponenda sine neccesitate - mysql's philosphy
      # perl -MCPAN -e install Module::Name
      :-)

      Makeshifts last the longest.

        ppm install Module::Name

        ;-)

        --
        Regards,
        Helgi Briem
        helgi DOT briem AT decode DOT is

Re: Things every perl programmer should know?
by thelenm (Vicar) on Jun 04, 2003 at 15:51 UTC

    I use Perl for a lot of quick file-munging (and golf :-), so I've found it very useful to be familiar with the command line options as described in perlrun. The -n, -p, and -i options can be particularly handy if you're just running through files doing something to each line. For example, you can replace "foo" with "bar" in every .txt file in a directory with this one-liner:

    perl -pi -e 's/foo/bar/g' dir/*.txt

    Very simple, but very powerful. Of course you can do more complex things with this as well.

    -- Mike

    --
    just,my${.02}

Re: Things every perl programmer should know?
by halley (Prior) on Jun 04, 2003 at 15:00 UTC

    After several such questions in the past couple of days, I'd say that the crucial thing that all Perl programmers need to know is that hashes associate strings to scalars, and that's all. Learn how to deal with references to fit within that constraint. Learn what tricky module hacks can make hashes "appear" to break the rules.

    --
    [ e d @ h a l l e y . c c ]

Re: Things every perl programmer should know?
by gjb (Vicar) on Jun 04, 2003 at 15:31 UTC

    Perl regular expressions, regexes, and more regexes... the finesses, what they can and can't do.

    Just my 2 cents, -gjb-

(jeffa) Re: Things every perl programmer should know?
by jeffa (Bishop) on Jun 04, 2003 at 19:40 UTC
    A great skill to have as a Perl programmer is CPAN search-fu, the ability to find the right (or sometimes "right enough") module to fit the problem at hand. This is a skill that is not gained overnight. I have no idea how longs it takes one to become a master librarian of CPAN, but i am sure that it takes practice, application, trial and error, making the wrong choices, and most importantly, asking advice to become one.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Things every perl programmer should know? (faster?)
by tye (Sage) on Jun 04, 2003 at 21:52 UTC

    I'd like to know why globbing is so much slower than readdir. Sure, that makes sense for really old versions of Perl (where the slowness of glob is insignificant to the security problems of it).

    I could even see glob being slightly slower than readdir, such that you likely wouldn't notice the difference, especially in a script (like your example) that does something other than list file names.

    But you say using glob makes your whole script a lot slower ("impressive"). What I, as a Perl programmer, would like to know, is some estimate of how much slower glob is than readdir. How about some numbers based on what you saw as one source of data? You said over 4000 files and show the contents of the files being read. How long did the different versions take? About how large were the files you were reading?

    I often prefer glob as it often makes for less code (and has fewer 'gothca's). But if it is hugely slower, then I'll prefer it less often.

    Thanks.

                    - tye
      What I, as a Perl programmer, would like to know, is some estimate of how much slower glob is than readdir. How about some numbers based on what you saw as one source of data? You said over 4000 files and show the contents of the files being read. How long did the different versions take? About how large were the files you were reading?
      I was curious myself, so I created a directory with 10000 files (zero size), half of them with the extension ".tmp". I wrote equivalent readdir and glob statements to get all of the *.tmp files, and the glob is noticeably slower. It takes about 3 seconds while the readdir with grep seems instantaneous. This is just a rough eyeball benchmark, but still, the glob is definitely slower. I even switched the order of the directory reads, and got the same result. Here is the code (this was run on an old AIX system):
      $|++; print "Glob\n"; my @file2 = glob("tmp/*.tmp"); my $count2 = @file2; print "$count2\n"; print "Readdir\n"; opendir(DIR, "tmp") or die "Acck: $!"; my @file1 = grep /\.tmp$/, readdir DIR; my $count1 = @file1; print "$count1\n"; closedir DIR; print "Done\n";
      I'd say if the 3 seconds doesn't matter, I'd do it with glob to save myself the coding. If this were embedded in a library that I expected other people to use, I'd do it with readdir 'just in case' speed matters.

      Update:This was perl 5.6.1 (and I received the same results with File::Glob::bsd_glob and the GLOB_NOSORT option).

      Another update: I vaguely remember glob being painfully slow on Windows at one time, which has since been fixed (I think). And it may have been only with the angle bracket syntax...can anyone confirm?

        You don't mention which version you ran your test on, but if it was 5.8, part of the reason could be that glob now returns the files pre-sorted (according to perldelta).

        However, I think that tye was alluding to that the OP's code shows not just reading list of files, but looping over them, opening and reading each one in turn. Assuming that the opening and reading part stayed the same, 2 or 3 seconds on generating the list is unlikely to make a significant difference overall.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        Thanks. It is nice to have some numbers and some corroboration. 3 seconds is almost shockingly slow to just read 10000 filenames and compare to a glob string. Especially since I thought Perl is using C code for this (taken from some well-respected free shell implementation, I think).

        I also appreciate the "I'd do it with glob [... unless] embedded in a library", which matches my thinking on the subject (except that I think glob can probably be "fixed" and so I might even use glob in a library based on that expectation).

        So I'm still curious why glob is so slow. I'm not sure when I'll find the time/motivation to investigate further (probably comparing built-in glob vs. File::Glob, File::DosGlob, and my own File::KGlob and verifying some of my own assumptions).

        I'm also still interested in seeing numbers from the original poster. It sounds like you would see less than 2 seconds difference in run time in a similar situation. I have a hard time characterizing 2 seconds as an "impressive" difference, but I've seen such characterizations made based on the ratio of run times so that might be the whole story. But it could also be that licking9Volts was seeing a much bigger difference.

                        - tye
      The files I work with are actually logs that record information at set intervals down an oil well's borehole. So, depending on the depths of these logs, I have files that range in size from around 100kb to 50mb. My program only needs to read from a block of information at the top, and extract certain bits to a text file. Once it hits a specific delimiter, it closes the file and moves on to the next file. The file name is a 10-14 digit identification number plus an optional version number. When I was globbing the files, it was taking around 5 minutes to get started. With the readdir, it starts immediately.

      I am running perl version 5.6.0 on Win2k. Our machines here at work are managed by off-site personnel and have the registries and "Program Files" folders locked down so I can't upgrade the perl version or make any modifications to it. Any software has to be tested and packaged for company-wide distribution via intranet. It really sucks. So anyway, that's my situation. Here's the node where the readdir was suggested: File glob question. Let me know if you have any suggestions. Thanks!

      licking9Volts

        So readdir isn't necessarily significantly faster, as far as finishing time is concerned---the important factor is the impression of responsiveness, seeing the first results after a short delay instead of after all the file names have been globbed.

Re: Things every perl programmer should know?
by l2kashe (Deacon) on Jun 04, 2003 at 17:47 UTC
    Things every Perl programmer should know...

    How to manipulate the variable types, take and use references to them, and probably by extension how they are stored in the symbol table. What the performance benefits are between them, and how to determine if the speed increase from using say an array over a hash is really worth it, if they spend extra cycles attempting to determine which index to get the data back out of. Also the difference between passing around an array vs passing around a reference to an array

    slices, map, and grep should be tossed out there as well I suppose ;).. Most notably slices, it's amazing how a loop can be inlined via map/grep and an array/hash slice.

    Update:everybody's favorite friend Data::Dumper should probably be tossed out there too ;)

    MMMMM... Chocolaty Perl Goodness.....
Re: Things every perl programmer should know?
by artist (Parson) on Jun 04, 2003 at 15:42 UTC
    First there should be some basic learning of Perl and generally available related things. You should be able to relate a specific skill for specific purpose. Once you have these co-relation in mind practiced, you won't need to remember everytime. The sense of the purpose will make the learning clear and you will continue finding better way to do it. I would call that building a personal expert system.

    In this specific example, you were looking for speed. Many other criteria can be applied for better performance at various stages.

    artist

Re: Things every perl programmer should know?
by kodo (Hermit) on Jun 05, 2003 at 06:58 UTC
    While most things already have been mentioned here, one book wasn't: effective perl programming by joseph n. hall and merlyn. I really like that book...if you look for perl-tricks and want to know howto code perlish perl this one is for you :-)

    kodo
Re: Things every perl programmer should know?
by zarniwoop (Initiate) on Jun 08, 2003 at 17:09 UTC
    Read and digest one entry from the Perl FAQ manpages every day, sections 4 thru 9.

    man perlfaq4

    You'll be amazed at what you didn't know.

    Zarni-bit-of-a-late-reply-woop