Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

clearer code

by Simplicus (Monk)
on Apr 17, 2000 at 23:07 UTC ( [id://7872]=perlquestion: print w/replies, xml ) Need Help??

Simplicus has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that goes thru an input file (always named cpuf.rpt) and does some formatting on it. The problem is that the file is "formatted" already by some wannabe's old utility, so I have to unformat it first, then put it into the format I need for (my) generic reporting utility.
#!/usr/perl/bin -w my @coswork_test; my @func_test; my $count = 0; open (REPORT, "cpuf.rpt") || die "Can't open CPUF.RPT: $!\n"; open (OUTFILE, ">cpuf.med") || die "Can't open CPUF.OUT: $!\n"; while (<REPORT>) { s/^ +//; #remove the leading whitespace; s/\r//; #remove all carriage returns; s/ +$//; #remove the trailing whitespace; s/.+ : //; #remove the header information; if ($_ =~ /\*+/) { $_ =~ s/\*+//; #remove asterisk only rows; $count++; #the asterisk line only happens once per recor +d, } #so we can use it to count records. s/\n/³/; if (($_ =~ /Cosmetics/)||($_ =~ /Function/)) { #look for _X_Pass or + _X_Fail $_ =~ s/.+_X_(.{4}).*/$1/; #make it Pass or Fail } print OUTFILE "$_"; #print the line to the intermediate file. } close OUTFILE; open (INFILE, "cpuf.med") || die "Can't open CPUF.MED $!\n"; open (OUTFILE, ">cpuf.out") || die "Can't open CPUF.OUT $!\n"; #now, print the headers: print OUTFILE "DATE³QC SPEC.³REPAIR NUM³TECHNICIAN³CPU MODEL³CPU ASSET + NUM³COSMETICS ". "WORKMANSHIP TEST³FUNCTIONALITY TEST³FAILURE DETAIL\n"; #and the information: while (<INFILE>) { s/.{129}//; s/³{2}/\n/g; #replace ³PassFail³ with ³Pass³Fail³, and Fail with Fail³³ s/³Fail³/³Fail³³/g; s/³PassFail³/³Pass³Fail³/g; s/Failure Detail - //g; print OUTFILE "$_"; }
As you can see, it's a veritable feast of regex. What I would like to know is how I can combine some of these substitutions and create code that is (maybe) a bit more self documenting.
Two goals here - I'm a lazy typist and I don't want to forget what it was supposed to do when I come back to it in a year....
Simplicus

Replies are listed 'Best First'.
Re: clearer code
by stephen (Priest) on Apr 18, 2000 at 04:43 UTC

    I was dreaming when I wrote this, forgive me if it goes astray...

    All in all, it's a pretty clear piece of code as is. If I were a contractor assigned to it as legacy code, I would be very thankful and think nice thoughts about the coder that wrote it.

    Couple of things you could do:

    • You could put the regexps in a subroutine called 'clean_line', and have them return the line without all the "formatting". Lose a bit of efficiency, gain a bit of clarity and modularity. Pulling stuff out into subroutines frequently makes them clearer.
    • You could combine the first two '\s' lines into a single 's/^\s*(.*?)\s*$/$1/;' expression. Don't know if that would slow things down; might.
    • At the risk of kicking off a religious war, you might want to put your comments on the line above the code, instead of to the right. Easier to maintain.
    • 'use strict'. I don't think it would change anything, just good to have in there.
    • The superscript-three up there would look like line noise to me, and I'd probably be tempted to delete it. Replace it with a backslash code.

    So, here's your first input loop, rewritten as I've suggested:

    while (<REPORT>) { $_ = clean_line($_); if ($_ =~ /\*+/) { $_ =~ s/\*+//; #remove asterisk only rows; $count++; #the asterisk line only happens once per reco +rd, } #so we can use it to count records. if (($_ =~ /Cosmetics/)||($_ =~ /Function/)) { #look for _X_Pass o +r _X_Fail $_ =~ s/.+_X_(.{4}).*/$1/; #make it Pass or Fail } print OUTFILE "$_"; #print the line to the intermediate file. } ## ## clean_line() ## ## Argument: $line (string) - Line from the file that needs cleaning ## ## Returns: string - cleaned-up line ## ## Clean_line() returns the $line minus leading and ## trailing whitespace, headers, etc. ## sub clean_line { my $line = shift; #remove the leading and trailing whitespace $line =~ s/^\s*(.*?)\s*$/$1/; tr/\n\r³//; #remove all carriage returns; s/.+ : //; #remove the header information; }

    Something like that... my brain isn't running. Also, you could consider reading the input file record-at-a-time by setting the input record separator to '*' x (whatever number of stars they put in the input file) . "\n". Then you wouldn't need to watch for the star-lines and count them separately. You might need to modify your regexps to deal with that though.

    stephen

      Thanks. I appreciate the help, and the opportunity to learn more.

      Simplicus.
RE: clearer code
by chromatic (Archbishop) on Apr 18, 2000 at 01:40 UTC
    Hmm, it's not bad as it is. Here are some things I would do differently:
    while (<REPORT>) { s/^\s+//; # get rid of leading whitespace s/\s+$//; # get rid of trailing whitespace tr/\r//d; # get rid of carriage returns the quick way tr/\n/³/; # superscript 3? I don't get it. s/.+ : //; # get rid of headers if (s/^\*+$//) { $count++; } # get rid of all asterisk rows, one for each record if (/(Cosmetics|Function)/) { s/.+_X_(Pass|Fail).*/$1/; # remove all but Pass or Fail }
    and the other section looks about as compact as possible. I don't really see the point of s/.{129}//, though (it's not anchored, and 129 is an odd number). Perhaps substr would be more efficient. On second thought, you could do something like this:
    s/Failure Detail - //g; # must come before next regexp s/³(Pass|Fail)/³$1³/; # Pass -> Pass³, Fail³ -> Fail³³ (implied :)
    That depends on some assumptions about your dataset, notably that there are no instances of "Fail" which don't meet the search criteria.
      the s/.{129}// was to remove the first 129 characters of the file. This was header info only, and confused the issue. Maybe I should have used ^, but since there is no g at the end of the line, doesn't it amount to the same thing? I mean the regex is going to be searched from the top each time, isn't it?
      Seems to work, but am I being sloppy?
      Simplicus.
        Anchoring a regex (with ^ or $) will nearly always make it more efficient, as it tells the engine that the expression can only match starting from (or ending with) one position.

        In this case, though, when you know you want to get rid of 129 characters, substr will definitely be faster. Something like: substr $text, 0, 129, ''; (oops, it *should* be length and not offset. Thanks btrott!)

Re: clearer code
by pschoonveld (Pilgrim) on Apr 18, 2000 at 00:52 UTC
    Regex's are difficult to explain w/o extensive comments. If you are worried about forgetting regex syntax details (as I often am), I would comment the hell out of it. But, I think you have done a lot of that already. I would suck it up and leave it as it is. Or, rewrite is using C-style string manipulation functions (yuck).
      Thanks
      I have spent enough time in my life writing str.... and mem.... function calls, no thank you. S

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://7872]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2025-07-10 16:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.