Re: Output problems

Welcome to the Monastery.

[While it's fine to write different versions of statements — commenting and uncommenting them to see different effects (I do this myself on occasions) — please don't leave all the commented out lines in the code you post. It looks like over 50% of your code has been commented out: that's just a lot of noise for us to wade through.]

As you say you're "new to perl", I'll attempt to step through the uncommented code and highlight what's good, bad and ugly. :-)

Firstly, you've used strict and warnings. Excellent start! Use them in all your code: turn parts of them off, in limited scope, only when necessary.

Next, you've used the 3-argument form of open with lexical filehandles: also very good - keep doing that. However, you've given the filehandles global scope: that's less good. Also, your naming could be improved: $keyword near the end of your code doesn't immediately make me think it's a filehandle (less so when it's an argument to return); $fh near the end does make me think it's a filehandle but which one (we're dealing with three files here).

You've made some effort to check I/O by adding 'or die "..."' to your open statements. While it's important to check I/O, doing so this way is tedious and error-prone: as evidenced by the fact that you forgot it in one place and typed the wrong filename in the other two. Consider letting Perl do this for you with the autodie pragma.

On to regex creation. Firstly, the general method you used to create the alternation is good. Two minor points: the '\E' is unnecessary because it's implicit at the end of the string (one possible use might be something like "\Q$quote_this\E$no_quoting_here"); the use of 'qr' here is questionable (perhaps useful if you wanted to make your keyword search case-insensitive: 'qr/(?i:\Q$_)/').

Your use of the '\b' assertion is highly problematic in this context. It matches a boundary between word (\w) and non-word (\W) characters. You're quoting the keywords because they might contain non-word characters. The two of these won't play nicely together! In one of your comments you state: "matches even those words with dots"; a more accurate comment would be: "somethimes matches words with dots depending on where the dots are". A couple of examples to demonstrate this:

$ perl -we 'q{xxx.xxx} =~ /\b(xxx\.xxx)\b/; print "|$1|\n"'
|xxx.xxx|
$ perl -we 'q{.xxx} =~ /\b(\.xxx)\b/; print "|$1|\n"'
Use of uninitialized value $1 in concatenation (.) or string at -e lin
+e 1.
||
[download]

This is explained in more detail in "perlrebackslash: Assertions". A better option might be to replace the first '\b' with a negative lookbehind assertion and the second '\b' with a negative lookahead assertion. See "perlre: Lookaround Assertions" for details. Rewriting the two previous examples:

$ perl -we 'q{xxx.xxx} =~ /(?<!\w)(xxx\.xxx)(?!\w)/; print "|$1|\n"'
|xxx.xxx|
$ perl -we 'q{.xxx} =~ /(?<!\w)(\.xxx)(?!\w)/; print "|$1|\n"'
|.xxx|
[download]

Finally, we get to the actual processing. You appear to have become totally lost here so, beyond saying that pretty much everything here is wrong, I won't comment further. All you need is a while loop to iterate your input file; printing those lines that match your regex to your output file.

I created dummy log and key files:

$ cat log_to_read.txt
1 abc
2 def
3 ghi
4 jkl
5 def
6 ghi
7 abd
8 abcdefghi
9 abc def ghi
10 $@%
11 \$\@\%

$ cat keys_to_find.txt
abc
def
ghi
$@%
[download]

I then wrote this short script, incorporating all the points raised above:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my $log_search_re;

{
    open my $keys_fh, '<', 'keys_to_find.txt';
    $log_search_re = qr/(?x: (?<!\w) (?:
        @{[ join '|', map { chomp; "\Q$_" } <$keys_fh> ]}
    ) (?!\w) )/;
}

{
    open my $log_fh, '<', 'log_to_read.txt';
    open my $out_fh, '>', 'log_lines_matched.txt';
    /$log_search_re/ && print $out_fh $_ while <$log_fh>;
}
[download]

Here's the output file after running that script:

$ cat log_lines_matched.txt
1 abc
2 def
3 ghi
5 def
6 ghi
9 abc def ghi
10 $@%
[download]

I've added quite a lot of documentation links throughout but feel free to ask if anything needs further explanation.

— Ken

Comment on Re: Output problems Select or Download Code


Clear questions and runnable code get the best and fastest answer
	PerlMonks