Perl File Parsing - My Code Works, but it's Ugly!

Nico has asked for the wisdom of the Perl Monks concerning the following question:

Hey all! New to PerlMonks and Perl in general. I have been trying to create a script as a side project (personal knowledge, no monetary gain) that goes through about 30 files and pulls out lines that match a specific set of search criteria.

My code works, but I wanted to ask if there is a more "clean" way to do it.

1) I set some variables for the HTML report and the log file. Currently I don't have any HTML syntax being written, that will come later once I get everything nailed down.

2) I create the two files that I am going to be writing to.

3) I check to see if the file exists. If it does, I open the HTML file and the log file for appending. If it doesn't, I write to the log file

4) I pull out only the lines I want, and then write them to the HTML file.

5) I close the HTML file and the log file.


#!/usr/bin/perl
use warnings;
use strict;
use 5.010;
use Time::Local;

my $file_01_txt = "file_01.txt";

($sec, $min, $hour, $mday, $mon, $year, $j1,$j2, $j3) =localtime(time)
+;

$year = $year+1900;
$mon = $mon+1;
$outfolder = "REPORT_output\_$year\_$mon\_$mday\_\_$hour\_$min\_$sec";

mkdir $outfolder;
mkdir "$outfolder/log";

$html_file = <$outfolder/REPORT.html>;
$log_file = <$outfolder/log/log.txt>;

open($html_report, '>', $html_file) or die "Cannot open file $html_fil
+e";
open($log_report, '>', $log_file) or die "Cannot open file $log_file";

main();

sub main {
    if (-e $file_01_txt) {
        # We are going to open the log file for appending.
        open($log_report, '>>', $log_file) or die "Cannot open $log_fi
+le.";
    
        # We are going to open the HTML file for appending.
        open($html_report, '>>', $html_file) or die "Cannot open file 
+$html_file";    
    
        # If the file exists, we are going to open it, so we can read 
+from it.        
        print $log_report "$file_01_txt has been found.\n";
        open (my $fh, "<", $file_01_txt) or die "Cannot open file $fil
+e_01_txt";
        
        print $html_report "\n";
        print $html_report "#### Now Analyzing '$file_01_txt'\n";
        print $html_report "\n";
        while (my $line = <$fh>) {
            chomp $line;
            if ($line =~ /search criteria 01/i) {
                print $html_report "$line\n";
            }
            if ($line =~ /search criteria 02/i) {
                print $html_report "$line\n";
            }
            if ($line =~ /search criteria 03/i) {
                print $html_report "$line\n";
                print $html_report "\n";
            }
        }
        close $log_report;
        close $html_report;
    } else {
        # We are going to open the log file for appending.
        open($log_report, '>>', $log_file) or die "Cannot open $log_fi
+le.";
        
        # We are going to log that the file has not been found.
        print $log_report "$file_01_txt has not been found.\n";
        
        close $log_report;
    }
}
[download]

Is there a more clean way of opening, writing to, and closing the HTML and log files?

Right now, I have one "main" sub that holds all of the "if" statements for each file. Should I break those up into different subs?"

Thanks for any help you might be able to provide!

Nico

Comment on Perl File Parsing - My Code Works, but it's Ugly! Download Code

Replies are listed 'Best First'.
Re: Perl File Parsing - My Code Works, but it's Ugly! by aaron_baugher (Curate) on May 31, 2015 at 18:02 UTC
Nothing huge, but some cleanup things: You don't ever use Time::Local. localtime is a built-in function. You need my in front of your date/time variable list, to run under strict. You don't need 'junk' variables ($j1, $j2, $j3) for the extra values from localtime. If your list on the left has fewer variables than are passed from the right, the extra ones will be discarded. Your day and month adjustments can be done shorter and more cleanly with increment operators: `$year += 1900; $mon++;` You should probably do error checking on your mkdir statements. It's strange that you're opening your output files once, and then opening them again right away inside your subroutine. In fact, I'd almost expect that to cause an error or warning, but maybe Perl silently closes and re-opens it. Are you doing that so the first open (with '>') will start the files empty, and then switch to appending? It's probably unnecessary, but maybe you could explain your thinking there. When you have multiple print statements in a row, you can often clean things up with a here-document: `print $html_report <<END; #### Now Analyzing '$file_01_txt' END` [download] When you have "if(this){single_statement}" constructs, you can often eliminate much of the clutter of parens and brackets with postfix syntax: `print $html_report "$line\n" if $line =~ /search criteria 01/i;` [download] Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply] [d/l] [select]
Re^2: Perl File Parsing - My Code Works, but it's Ugly! by Nico (Novice) on May 31, 2015 at 20:55 UTC
Thank you for your response! I have no idea what I was thinking with those opens and closes. I only need to specify them once at the beginning of the script, and I can use them throughout the script. I removed the junk variables and removed Time::Local. It works! I will need to figure out how to error check my mkdir statements. But luckily I only have those two statements there so shouldn't be too hard	[reply]
Re^3: Perl File Parsing - My Code Works, but it's Ugly! by aaron_baugher (Curate) on May 31, 2015 at 21:29 UTC
You can error-check mkdir much like you do open: `mkdir $newdir or die "Unable to mkdir $newdir : $!";` [download] That's true of most commands in Perl; they return true if successful and false on failure, so you can use "do_this() or show_error()" logic. (One exception is system, which returns the return value of the underlying command, which (on Unix at least) is zero on success, so you have to watch out for that.) Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply] [d/l]
Re: Perl File Parsing - My Code Works, but it's Ugly! by stevieb (Canon) on May 31, 2015 at 16:43 UTC
Here is some example code that depicts one way you can eliminate the if() statements. Put your regexes into an array, and loop over them all at once per line. The `$i` variable is simply there to show that all regexes are being searched, and performs no functional use. `#!/usr/bin/perl use warnings; use strict; my @regexes = qw(one two three); my $line = "this is thREE"; my $i = 1; for my $regex (@regexes){ if ($line =~ /$regex/i){ print "$i\n"; } $i++; }` [download] Also, although just preference, I like to write my opens like the following. I feel it's cleaner. Note the use of `$!` in the die() statement; it explains why the file couldn't be opened. Also note the use of 'my'... you should be using this to define each variable. You'll have to do this when you implement `use strict;` as I stated in an earlier post. `open my $log_report, '>>', $log_file or die "Can't open the file $log_file: $!";` [download] -stevieb	[reply] [d/l] [select]
Re^2: Perl File Parsing - My Code Works, but it's Ugly! by Nico (Novice) on May 31, 2015 at 17:41 UTC
Thank you for the suggestion on how to write my opens. Your way looks MUCH more clean.	[reply]
Re: Perl File Parsing - My Code Works, but it's Ugly! by golux (Chaplain) on May 31, 2015 at 19:58 UTC
Hi Nico, One other suggestion concerning your escaping of underscores "_": `$outfolder = "REPORT_output\_$year\_$mon\_$mday\_\_$hour\_$min +\_$sec";` [download] The underscore character doesn't need to be escaped like this everywhere in a string. If you were doing so to separate it from the preceding variable (ie. "`$year\_$mon`" in order to interpolate `$year` instead of `$year_`), there's another, more common way to do so which is arguably a little easier to read: `$outfolder = "REPORT_output_${year}_${mon}_${mday}__${hour}_${ +min}_$sec";` [download] This syntax is also often used to separate variables from other chars which could be part of the variable name; eg.: `my $msg = "Hello world"; my $color = 101; # Red background print "\e[${color}m $msg \e[m\n"; # Embed $color in escape sequence` [download] say substr+lc crypt(qw $i3 SI$),4,5	[reply] [d/l] [select]
Re^2: Perl File Parsing - My Code Works, but it's Ugly! by aaron_baugher (Curate) on May 31, 2015 at 22:03 UTC
Another option would be something like this, though whether it's more readable might be a matter of personal preference: `my $outfolder = join '_', 'REPORT', 'output', $year, $mon, $mday, '', +$hour, $min, $sec;` [download] Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply] [d/l]
Re^2: Perl File Parsing - My Code Works, but it's Ugly! by Anonymous Monk on Jun 01, 2015 at 10:22 UTC
TIMTOWTDI: `$outfolder = sprintf "REPORT_output_%d_%d_%d__%d_%d_%d", $year, $mon, $mday, $hour, $min, $sec;` [download] With the advantage that you could do something like: `$outfolder = sprintf "REPORT_output_%04d_%02d_%02d__%02d_%02d_%02d", $year, $mon, $mday, $hour, $min, $sec;` [download] giving `REPORT_output_2015_06_01__00_19_42` [download]	[reply] [d/l] [select]
Re^3: Perl File Parsing - My Code Works, but it's Ugly! by BillKSmith (Monsignor) on Jun 01, 2015 at 14:08 UTC
I prefer the function strftime in POSIX for formatting dates. The only downside is that documentation for the format codes is "borrowed" from "C" and not included in perl's documentation. `use strict; use warnings; use POSIX 'strftime'; my $outfolder = strftime "Report_Output_%Y_%m_%d__%H_%M_%S", localtime(); print $outfolder;` [download] Bill	[reply] [d/l]
Re: Perl File Parsing - My Code Works, but it's Ugly! by pme (Monsignor) on May 31, 2015 at 15:44 UTC
Hi Nico, Welcome to the monastery! Your code does not compile. Could you share your working code?	[reply]
Re^2: Perl File Parsing - My Code Works, but it's Ugly! by Nico (Novice) on May 31, 2015 at 16:08 UTC
Sorry about that, let me see if I can paste a working portion in here #!/usr/bin/perl use 5.010; use Time::Local; my $file_01_txt = "file_01.txt"; ($sec, $min, $hour, $mday, $mon, $year, $j1,$j2, $j3) =localtime(time) +; $year = $year+1900; $mon = $mon+1; $outfolder = "REPORT_output\_$year\_$mon\_$mday\_\_$hour\_$min\_$sec"; mkdir $outfolder; mkdir "$outfolder/log"; $html_file = <$outfolder/REPORT.html>; $log_file = <$outfolder/log/log.txt>; open($html_report, '>', $html_file) or die "Cannot open file $html_fil +e"; open($log_report, '>', $log_file) or die "Cannot open file $log_file"; main(); sub main { if (-e $file_01_txt) { # We are going to open the log file for appending. open($log_report, '>>', $log_file) or die "Cannot open $log_fi +le."; # We are going to open the HTML file for appending. open($html_report, '>>', $html_file) or die "Cannot open file +$html_file"; # If the file exists, we are going to open it, so we can read +from it. print $log_report "$file_01_txt has been found.\n"; open (my $fh, "<", $file_01_txt) or die "Cannot open file $fil +e_01_txt"; print $html_report "\n"; print $html_report "#### Now Analyzing '$file_01_txt'\n"; print $html_report "\n"; while (my $line = <$fh>) { chomp $line; if ($line =~ /search criteria 01/i) { print $html_report "$line\n"; } if ($line =~ /search criteria 02/i) { print $html_report "$line\n"; } if ($line =~ /search criteria 03/i) { print $html_report "$line\n"; print $html_report "\n"; } } close $log_report; close $html_report; } else { # We are going to open the log file for appending. open($log_report, '>>', $log_file) or die "Cannot open $log_fi +le."; # We are going to log that the file has not been found. print $log_report "$file_01_txt has not been found.\n"; close $log_report; } } [download] That should work. I just ran it and everything came back as it should. Please let me know if I need to give you anything else!	[reply] [d/l]
Re: Perl File Parsing - My Code Works, but it's Ugly! by stevieb (Canon) on May 31, 2015 at 16:31 UTC
This would be pretty trivial to clean up, but please replace your code in the original post with the code you posted in a below post. Also, append in a different code tag section a snippet of the input file, and specify exactly what your 'search criteria' actually is, so we can test it for real. One more thing, put: `use warnings; use strict;` [download] at the top of your script (under the shebang line), rerun it and fix any errors that are shown. -stevieb	[reply] [d/l]
Re^2: Perl File Parsing - My Code Works, but it's Ugly! by Nico (Novice) on May 31, 2015 at 17:51 UTC
Hi again! I just wanted to let you know I reran the script after adding the following lines: `use warnings; use strict;` [download] and received a bunch of error messages in regards to "explicit package name" requirements. For example: Global symbol "$sec" requires explicit package name Time to figure out what that means!	[reply] [d/l]
Re^3: Perl File Parsing - My Code Works, but it's Ugly! by Nico (Novice) on May 31, 2015 at 18:00 UTC
I figured it out! It was because I wasn't declaring my variables with "my". Will have to research why that is	[reply]
Re^4: Perl File Parsing - My Code Works, but it's Ugly! by Athanasius (Archbishop) on Jun 01, 2015 at 05:34 UTC
Re^2: Perl File Parsing - My Code Works, but it's Ugly! by Nico (Novice) on May 31, 2015 at 17:07 UTC
Thank you for your reply! I have updated the code in the original post, and added what you suggested. I am going to re-run the script now and see what I get.	[reply]
Re^3: Perl File Parsing - My Code Works, but it's Ugly! by ww (Archbishop) on May 31, 2015 at 20:38 UTC
Given the advice you took, no downvotes, handslaps or similar.... BUT FOR FUTURE REFERENCE: Please, do NOT ever remove anything you've posted^1 because doing so can take away the context of replies you may have received (or, perhaps, are even being written as you make you change). Instead, use <strike> content to be deleted before closing the tag </strike> to disavow, show that you've changed something, etc. AND add a warning about your changes -- something like "NOTE: updated (did such and such" -- in a way that makes clear what you did and why. And, of course, WELCOME to PM! ^1 There are exceptions but they are beyond the scope of this advice. As a newcomer, you'll find these worth reading.	[reply]