Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Perl File Parsing - My Code Works, but it's Ugly!

by Nico (Novice)
on May 31, 2015 at 14:39 UTC ( [id://1128487]=perlquestion: print w/replies, xml ) Need Help??

Nico has asked for the wisdom of the Perl Monks concerning the following question:

Hey all! New to PerlMonks and Perl in general. I have been trying to create a script as a side project (personal knowledge, no monetary gain) that goes through about 30 files and pulls out lines that match a specific set of search criteria.

My code works, but I wanted to ask if there is a more "clean" way to do it.

1) I set some variables for the HTML report and the log file. Currently I don't have any HTML syntax being written, that will come later once I get everything nailed down.

2) I create the two files that I am going to be writing to.

3) I check to see if the file exists. If it does, I open the HTML file and the log file for appending. If it doesn't, I write to the log file

4) I pull out only the lines I want, and then write them to the HTML file.

5) I close the HTML file and the log file.

#!/usr/bin/perl use warnings; use strict; use 5.010; use Time::Local; my $file_01_txt = "file_01.txt"; ($sec, $min, $hour, $mday, $mon, $year, $j1,$j2, $j3) =localtime(time) +; $year = $year+1900; $mon = $mon+1; $outfolder = "REPORT_output\_$year\_$mon\_$mday\_\_$hour\_$min\_$sec"; mkdir $outfolder; mkdir "$outfolder/log"; $html_file = <$outfolder/REPORT.html>; $log_file = <$outfolder/log/log.txt>; open($html_report, '>', $html_file) or die "Cannot open file $html_fil +e"; open($log_report, '>', $log_file) or die "Cannot open file $log_file"; main(); sub main { if (-e $file_01_txt) { # We are going to open the log file for appending. open($log_report, '>>', $log_file) or die "Cannot open $log_fi +le."; # We are going to open the HTML file for appending. open($html_report, '>>', $html_file) or die "Cannot open file +$html_file"; # If the file exists, we are going to open it, so we can read +from it. print $log_report "$file_01_txt has been found.\n"; open (my $fh, "<", $file_01_txt) or die "Cannot open file $fil +e_01_txt"; print $html_report "\n"; print $html_report "#### Now Analyzing '$file_01_txt'\n"; print $html_report "\n"; while (my $line = <$fh>) { chomp $line; if ($line =~ /search criteria 01/i) { print $html_report "$line\n"; } if ($line =~ /search criteria 02/i) { print $html_report "$line\n"; } if ($line =~ /search criteria 03/i) { print $html_report "$line\n"; print $html_report "\n"; } } close $log_report; close $html_report; } else { # We are going to open the log file for appending. open($log_report, '>>', $log_file) or die "Cannot open $log_fi +le."; # We are going to log that the file has not been found. print $log_report "$file_01_txt has not been found.\n"; close $log_report; } }

Is there a more clean way of opening, writing to, and closing the HTML and log files?

Right now, I have one "main" sub that holds all of the "if" statements for each file. Should I break those up into different subs?"

Thanks for any help you might be able to provide!

Nico

Replies are listed 'Best First'.
Re: Perl File Parsing - My Code Works, but it's Ugly!
by aaron_baugher (Curate) on May 31, 2015 at 18:02 UTC

    Nothing huge, but some cleanup things:

    • You don't ever use Time::Local. localtime is a built-in function.
    • You need my in front of your date/time variable list, to run under strict.
    • You don't need 'junk' variables ($j1, $j2, $j3) for the extra values from localtime. If your list on the left has fewer variables than are passed from the right, the extra ones will be discarded.
    • Your day and month adjustments can be done shorter and more cleanly with increment operators: $year += 1900; $mon++;
    • You should probably do error checking on your mkdir statements.
    • It's strange that you're opening your output files once, and then opening them again right away inside your subroutine. In fact, I'd almost expect that to cause an error or warning, but maybe Perl silently closes and re-opens it. Are you doing that so the first open (with '>') will start the files empty, and then switch to appending? It's probably unnecessary, but maybe you could explain your thinking there.
    • When you have multiple print statements in a row, you can often clean things up with a here-document:
      print $html_report <<END; #### Now Analyzing '$file_01_txt' END
    • When you have "if(this){single_statement}" constructs, you can often eliminate much of the clutter of parens and brackets with postfix syntax:
      print $html_report "$line\n" if $line =~ /search criteria 01/i;

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

      Thank you for your response! I have no idea what I was thinking with those opens and closes. I only need to specify them once at the beginning of the script, and I can use them throughout the script.

      I removed the junk variables and removed Time::Local. It works!

      I will need to figure out how to error check my mkdir statements. But luckily I only have those two statements there so shouldn't be too hard

        You can error-check mkdir much like you do open:

        mkdir $newdir or die "Unable to mkdir $newdir : $!";

        That's true of most commands in Perl; they return true if successful and false on failure, so you can use "do_this() or show_error()" logic. (One exception is system, which returns the return value of the underlying command, which (on Unix at least) is zero on success, so you have to watch out for that.)

        Aaron B.
        Available for small or large Perl jobs and *nix system administration; see my home node.

Re: Perl File Parsing - My Code Works, but it's Ugly!
by stevieb (Canon) on May 31, 2015 at 16:43 UTC

    Here is some example code that depicts one way you can eliminate the if() statements. Put your regexes into an array, and loop over them all at once per line. The $i variable is simply there to show that all regexes are being searched, and performs no functional use.

    #!/usr/bin/perl use warnings; use strict; my @regexes = qw(one two three); my $line = "this is thREE"; my $i = 1; for my $regex (@regexes){ if ($line =~ /$regex/i){ print "$i\n"; } $i++; }

    Also, although just preference, I like to write my opens like the following. I feel it's cleaner. Note the use of $! in the die() statement; it explains *why* the file couldn't be opened. Also note the use of 'my'... you should be using this to define each variable. You'll have to do this when you implement use strict; as I stated in an earlier post.

    open my $log_report, '>>', $log_file or die "Can't open the file $log_file: $!";

    -stevieb

      Thank you for the suggestion on how to write my opens. Your way looks MUCH more clean.

Re: Perl File Parsing - My Code Works, but it's Ugly!
by golux (Chaplain) on May 31, 2015 at 19:58 UTC
    Hi Nico,

    One other suggestion concerning your escaping of underscores "_":

    $outfolder = "REPORT_output\_$year\_$mon\_$mday\_\_$hour\_$min +\_$sec";

    The underscore character doesn't need to be escaped like this everywhere in a string. If you were doing so to separate it from the preceding variable (ie. "$year\_$mon" in order to interpolate $year instead of $year_), there's another, more common way to do so which is arguably a little easier to read:

    $outfolder = "REPORT_output_${year}_${mon}_${mday}__${hour}_${ +min}_$sec";

    This syntax is also often used to separate variables from other chars which could be part of the variable name; eg.:

    my $msg = "Hello world"; my $color = 101; # Red background print "\e[${color}m $msg \e[m\n"; # Embed $color in escape sequence
    say  substr+lc crypt(qw $i3 SI$),4,5

      Another option would be something like this, though whether it's more readable might be a matter of personal preference:

      my $outfolder = join '_', 'REPORT', 'output', $year, $mon, $mday, '', +$hour, $min, $sec;

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

      TIMTOWTDI:

      $outfolder = sprintf "REPORT_output_%d_%d_%d__%d_%d_%d", $year, $mon, $mday, $hour, $min, $sec;

      With the advantage that you could do something like:

      $outfolder = sprintf "REPORT_output_%04d_%02d_%02d__%02d_%02d_%02d", $year, $mon, $mday, $hour, $min, $sec;

      giving

      REPORT_output_2015_06_01__00_19_42
        I prefer the function strftime in POSIX for formatting dates. The only downside is that documentation for the format codes is "borrowed" from "C" and not included in perl's documentation.
        use strict; use warnings; use POSIX 'strftime'; my $outfolder = strftime "Report_Output_%Y_%m_%d__%H_%M_%S", localtime(); print $outfolder;
        Bill
Re: Perl File Parsing - My Code Works, but it's Ugly!
by pme (Monsignor) on May 31, 2015 at 15:44 UTC
    Hi Nico,

    Welcome to the monastery!

    Your code does not compile. Could you share your working code?

      Sorry about that, let me see if I can paste a working portion in here

      #!/usr/bin/perl use 5.010; use Time::Local; my $file_01_txt = "file_01.txt"; ($sec, $min, $hour, $mday, $mon, $year, $j1,$j2, $j3) =localtime(time) +; $year = $year+1900; $mon = $mon+1; $outfolder = "REPORT_output\_$year\_$mon\_$mday\_\_$hour\_$min\_$sec"; mkdir $outfolder; mkdir "$outfolder/log"; $html_file = <$outfolder/REPORT.html>; $log_file = <$outfolder/log/log.txt>; open($html_report, '>', $html_file) or die "Cannot open file $html_fil +e"; open($log_report, '>', $log_file) or die "Cannot open file $log_file"; main(); sub main { if (-e $file_01_txt) { # We are going to open the log file for appending. open($log_report, '>>', $log_file) or die "Cannot open $log_fi +le."; # We are going to open the HTML file for appending. open($html_report, '>>', $html_file) or die "Cannot open file +$html_file"; # If the file exists, we are going to open it, so we can read +from it. print $log_report "$file_01_txt has been found.\n"; open (my $fh, "<", $file_01_txt) or die "Cannot open file $fil +e_01_txt"; print $html_report "\n"; print $html_report "#### Now Analyzing '$file_01_txt'\n"; print $html_report "\n"; while (my $line = <$fh>) { chomp $line; if ($line =~ /search criteria 01/i) { print $html_report "$line\n"; } if ($line =~ /search criteria 02/i) { print $html_report "$line\n"; } if ($line =~ /search criteria 03/i) { print $html_report "$line\n"; print $html_report "\n"; } } close $log_report; close $html_report; } else { # We are going to open the log file for appending. open($log_report, '>>', $log_file) or die "Cannot open $log_fi +le."; # We are going to log that the file has not been found. print $log_report "$file_01_txt has not been found.\n"; close $log_report; } }

      That should work. I just ran it and everything came back as it should.

      Please let me know if I need to give you anything else!

Re: Perl File Parsing - My Code Works, but it's Ugly!
by stevieb (Canon) on May 31, 2015 at 16:31 UTC

    This would be pretty trivial to clean up, but please replace your code in the original post with the code you posted in a below post.

    Also, append in a different code tag section a snippet of the input file, and specify exactly what your 'search criteria' actually is, so we can test it for real.

    One more thing, put:

    use warnings; use strict;

    at the top of your script (under the shebang line), rerun it and fix any errors that are shown.

    -stevieb

      Hi again!

      I just wanted to let you know I reran the script after adding the following lines:

      use warnings; use strict;

      and received a bunch of error messages in regards to "explicit package name" requirements. For example:

      Global symbol "$sec" requires explicit package name

      Time to figure out what that means!

        I figured it out! It was because I wasn't declaring my variables with "my". Will have to research why that is

      Thank you for your reply! I have updated the code in the original post, and added what you suggested. I am going to re-run the script now and see what I get.

        Given the advice you took, no downvotes, handslaps or similar....
              BUT FOR FUTURE REFERENCE:

        Please, do NOT ever remove anything you've posted*1* because doing so can take away the context of replies you may have received (or, perhaps, are even being written as you make you change). Instead, use <strike> content to be deleted before closing the tag </strike> to disavow, show that you've changed something, etc. AND add a warning about your changes -- something like "NOTE: updated (did such and such" -- in a way that makes clear what you did and why.

        And, of course, WELCOME to PM!

        *1* There are exceptions but they are beyond the scope of this advice.


        As a newcomer, you'll find these worth reading.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1128487]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2024-04-19 10:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found