Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Simple Regex Question / Code Review

by marquezc329 (Scribe)
on Oct 11, 2012 at 08:17 UTC ( #998387=perlquestion: print w/ replies, xml ) Need Help??
marquezc329 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everybody,

I've been working through Kernighan and Pike's The Unix Programming Environment as well as Kernighan and Ritchie's The C Programming Language and basically just rewriting c and sh scripts in Perl for practice (and because sometimes I just enjoy finding short Perl scripts to replace more drawn out examples of C).

I wrote the following to take a list of files from the command line and run them through gcc if their filenames qualify. Nothing difficult, but it's nice to finally write a functional piece of code that I can use while I continue learning Perl/C/*nix. I'm hoping to receive input on style (in particular my use of control structures/loops and regex's) and any general coding conventions I may be missing. I'm still a beginner so I fear that in reviewing my own code I may be missing weighty details in my learning.

#!/usr/bin/perl use strict; use warnings; sub getFiles { my @files; my $ex; foreach (@ARGV) { push @files, $_ if (-e $_ && m/.+\.[cC]$/); print "Couldn't Find $_\n" unless (-e $_); print "Invalid file: $_\n" unless (m/^.+\.[cC]$/); } return \@files; } sub compile { my $files = shift; my %table; foreach (@$files) { $table{$_} = substr $_, 0, -2; } `gcc $_ -o $table{$_}` foreach (keys %table); } compile(getFiles);

Is there a "better" way of handling filenames, and their modifications, to be passed to gcc for each file than my use of a hash? I tried to modify the filenames inline as in different combinations of things like:

`gcc $_ -o substr $_, 0, -2` foreach (@$files);

I couldn't get it to work syntactically, and I understand that clarity is sacrificed this way, but I suppose curiosity got the best of me. Is something like this even possible? I couldn't really find anything on Google about calling a Perl function within a Unix function within Perl.

My other question is regarding my use of substr() to remove the .c from the filenames. I tried to use a regex first instead of substr(), but I don't think I'm completely understanding the documentation I've read on using regex's for substitution. If a monk could provide an example for this particular situation to facilitate my learning, it would be greatly appreciated.

Thanks again for any input, and I apologize if this post is unwittingly simplistic.

Comment on Simple Regex Question / Code Review
Select or Download Code
Re: Simple Regex Question / Code Review
by Anonymous Monk on Oct 11, 2012 at 09:21 UTC

    my $blah = 'foo.Q'; print qq{gcc $blah -o @{[ substr $blah, 0, -2 ]} and many more }; __END__ gcc foo.Q -o foo and many more

    See Re^2: How can we interpolate an expression??

    See perlintro, perlrequick for the regex portion, and for the template, see (tye)Re: Stupid question (and one discussion of that template Re^2: RFC: Creating unicursal stars)

    #!/usr/bin/perl -- #~ 2012-10-11-01:58:20PDT by Anonymous Monk #~ perltidy -csc -otr -opr -ce -nibc -i=4 use strict; use warnings; use Data::Dump qw/ dd /; Main( @ARGV ); exit( 0 ); sub getFiles { my @files; foreach (@_) { if( /\.[cC]$/ ){ if( -e $_ ){ push @files, $_; } else { printf "Unreadable file: %s : (%d) %s\n", $_, $!, $!; } } else { print "Invalid file: $_\n"; } } return \@files; } sub compile { my $files = shift; for( @$files ){ ## all of these do the same thing, see perlintro / perlrequick for exp +lanation #~ my( @cmd ) = ( 'gcc', $_, '-o', /^(.+?)\.[^\.]+$/ ); #~ my( @cmd ) = ( 'gcc', $_, '-o', $_ =~ /^(.+?)\.[^\.]+$/ ); my( @cmd ) = ( 'gcc', $_, '-o' ); #~ push @cmd, $1 if /^(.+?)\.[^\.]+$/ ; push @cmd, $1 if $_ =~ /^(.+?)\.[^\.]+$/ ; dd \@cmd; system @cmd; } } sub Main { return print Usage() if not @_; my $files = getFiles( @_ ); dd $files; compile( $files ); } ## end sub Main sub Usage { <<"__USAGE__"; $0 $0 list of files to process __USAGE__ } ## end sub Usage __END__

    If you're going to be using qx, you might want to quote/escape vars (like filenames) with String::ShellQuote

Re: Simple Regex Question / Code Review
by AnomalousMonk (Abbot) on Oct 11, 2012 at 09:48 UTC

    Rather than a detailed review, just one thing that caught my eye...

    foreach (@$files) { $table{$_} = substr $_, 0, -2; }

    In this loop, the
        $table{$_} = substr $_, 0, -2;
    statement seems intended to snip the extension off the path/filename of a C source file and associate the file name so truncated with the original, full name. It assumes the extension is always two characters, e.g., '.c'. It's not a bad assumption in this case, but of course immediately fails in the face of extensions like '.cc', '.cpp', etc., leaving one with a perhaps very puzzling bug — but of course, this will never happen! Whenever I encounter an assumption like this, old scars begin to throb and I feel a strong urge to program defensively.

    Assuming one does not want to use well-tested and platform independent CPAN modules for manipulating file names, one might write something like (untested)
        for my $file_name (@$files) {
            (my $base_name = $file_name) =~ s{ [.] [^.]* \z }{}xms;
            $table{$file_name} = $base_name;
            }
    which snips off the last '.whatever' extension regardless of its length, and, even though more verbose, has, I feel, a certain 'self-documentation' quality.

    Update:

        `gcc $_ -o substr $_, 0, -2` foreach (@$files);
    I couldn't get it to work syntactically... clarity is sacrificed... Is something like this even possible?

    I agree with your concerns about clarity, but in any event, one way might be something like this (of course, you would use qx{...} instead of print qq{...}):

    >perl -wMstrict -le "print qq{gcc $_->[0] -o $_->[1]} for map [ $_, s{ [.] [^.]* \z }{}xmsr ], qw(see cee. foo.c foo/bar.cc foo/bar/baz.cpp) ; " gcc see -o see gcc cee. -o cee gcc foo.c -o foo gcc foo/bar.cc -o foo/bar gcc foo/bar/baz.cpp -o foo/bar/baz

    (Prior to 5.14 and the introduction of the  //r regex modifier, you can use
        map { (my $o = $_) =~ s{ [.] [^.]* \z }{}xms;  [ $_, $o ] }
    instead.)

    Further Update: Or even:

    >perl -wMstrict -le "print qq{gcc $_ -o @{[ s{ [.] [^.]* \z }{}xmsr ]}} for qw(see cee. foo.c foo/bar.cc foo/bar/baz.cpp); " gcc see -o see gcc cee. -o cee gcc foo.c -o foo gcc foo/bar.cc -o foo/bar gcc foo/bar/baz.cpp -o foo/bar/baz

      Thank you. Your well thought out examples of  print qq{...} / qx{...} in conjunction with the above reference to Re^2: How can we interpolate an expression?? provide an excellent foothold for the understanding of a concept that had originally seemed far-fetched to me. I think my bias towards clarity will prevail in this particular situation and I might stick with assigning pairs to a hash for the associated files. Is this exceedingly Novice?

      "Whenever I encounter an assumption like this, old scars begin to throb and I feel a strong urge to program defensively."

      In hindsight, I can definitely understand the ramifications of this assumption and plan on expanding the original script to include acceptance of a larger/more realistic spectrum of file extensions, as well as generating a more concentrated focus on defensive coding in my practice and learning.

      You sir, have hit the nail on the head in regards to the kind of input I was hoping to receive in posting this for review. This is a pristine example of the quality of this community and why I prefer it to the plethora of available forums. Your response has opened several paths of inquisition in my mind, that I intend to follow to fruition. Thanks again!

        ... my bias towards clarity ... Is this exceedingly Novice?

        In a professional/production environment, clarity is a jewel above price. The person who maintains your code months or years hence (and it may even be you; remember: the sanity you save may be your own) will sing your praises to the heavens if you give him or her a clear piece of code to work with.

Re: Simple Regex Question / Code Review
by choroba (Abbot) on Oct 11, 2012 at 10:44 UTC
    Do not use @ARGV in a subroutine. Use @_ and pass @ARGV as arguments. It makes the subroutine reusable.
    لսႽć ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      I think I went one-track minded while writing this and hadn't even thought about expanding this to a point where the subs would need to be reused. I can see why calling @ARGV from inside a subroutine is a horrible habit to develop. Thank you for helping me catch it early on in my learning journey.
Re: Simple Regex Question / Code Review
by Anonymous Monk on Oct 11, 2012 at 15:54 UTC

    Why not use make(1)? Here's a trivial makefile (save as Makefile):

    foo: foo.o bar: bar.o baz: baz.o

    After that, say make foo bar baz to compile all three binaries, if they are out of date.

    You can even programmatically generate those lines with gcc -MM *.c

    (The makefile I gave is not really complete, but might be enough. For further information, I recommend the document titled PMake -- a tutorial)

      Thanks! While I do understand that there is always going to be more than one or two ways of accomplishing a task in Perl, as well as in *nix systems in general, I was attempting to accomplish this particular task primarily as an exercise in my Perl scripting. Thank you for the reference though. Google'd + Bookmarked; I've never quite understood makefiles, and now you've inspired me to investigate.

        Make felt pretty impenetrable to me, too, as every example makefile I saw was usually a minimum of 20-30 lines.

        The tutorial I linked you to is quite heavy. I would read up to chapter 3.1 (skip 2.7) -- that covers the essentials.

        Anyway, I must warn you that PMake ("BSD Make") is not GNU Make. The latter can be found on pretty much every Linux system, and the most noticeable difference between them is the variable names. I find PMake's variables to be named much nicer ($(.TARGET)ávsá$@ or $(.IMPSRC)ávsá$<)

        Anyway, you can manage without touching those variables, and I recommended that tutorial because it does a good job in explaining the basics.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://998387]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2014-12-27 21:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls