Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Output problems

by kcott (Archbishop)
on Dec 08, 2016 at 03:38 UTC ( [id://1177473]=note: print w/replies, xml ) Need Help??


in reply to Output problems

G'day shingster08,

Welcome to the Monastery.

[While it's fine to write different versions of statements — commenting and uncommenting them to see different effects (I do this myself on occasions) — please don't leave all the commented out lines in the code you post. It looks like over 50% of your code has been commented out: that's just a lot of noise for us to wade through.]

As you say you're "new to perl", I'll attempt to step through the uncommented code and highlight what's good, bad and ugly. :-)

Firstly, you've used strict and warnings. Excellent start! Use them in all your code: turn parts of them off, in limited scope, only when necessary.

Next, you've used the 3-argument form of open with lexical filehandles: also very good - keep doing that. However, you've given the filehandles global scope: that's less good. Also, your naming could be improved: $keyword near the end of your code doesn't immediately make me think it's a filehandle (less so when it's an argument to return); $fh near the end does make me think it's a filehandle but which one (we're dealing with three files here).

You've made some effort to check I/O by adding 'or die "..."' to your open statements. While it's important to check I/O, doing so this way is tedious and error-prone: as evidenced by the fact that you forgot it in one place and typed the wrong filename in the other two. Consider letting Perl do this for you with the autodie pragma.

On to regex creation. Firstly, the general method you used to create the alternation is good. Two minor points: the '\E' is unnecessary because it's implicit at the end of the string (one possible use might be something like "\Q$quote_this\E$no_quoting_here"); the use of 'qr' here is questionable (perhaps useful if you wanted to make your keyword search case-insensitive: 'qr/(?i:\Q$_)/').

Your use of the '\b' assertion is highly problematic in this context. It matches a boundary between word (\w) and non-word (\W) characters. You're quoting the keywords because they might contain non-word characters. The two of these won't play nicely together! In one of your comments you state: "matches even those words with dots"; a more accurate comment would be: "somethimes matches words with dots depending on where the dots are". A couple of examples to demonstrate this:

$ perl -we 'q{xxx.xxx} =~ /\b(xxx\.xxx)\b/; print "|$1|\n"' |xxx.xxx| $ perl -we 'q{.xxx} =~ /\b(\.xxx)\b/; print "|$1|\n"' Use of uninitialized value $1 in concatenation (.) or string at -e lin +e 1. ||

This is explained in more detail in "perlrebackslash: Assertions". A better option might be to replace the first '\b' with a negative lookbehind assertion and the second '\b' with a negative lookahead assertion. See "perlre: Lookaround Assertions" for details. Rewriting the two previous examples:

$ perl -we 'q{xxx.xxx} =~ /(?<!\w)(xxx\.xxx)(?!\w)/; print "|$1|\n"' |xxx.xxx| $ perl -we 'q{.xxx} =~ /(?<!\w)(\.xxx)(?!\w)/; print "|$1|\n"' |.xxx|

Finally, we get to the actual processing. You appear to have become totally lost here so, beyond saying that pretty much everything here is wrong, I won't comment further. All you need is a while loop to iterate your input file; printing those lines that match your regex to your output file.

I created dummy log and key files:

$ cat log_to_read.txt 1 abc 2 def 3 ghi 4 jkl 5 def 6 ghi 7 abd 8 abcdefghi 9 abc def ghi 10 $@% 11 \$\@\% $ cat keys_to_find.txt abc def ghi $@%

I then wrote this short script, incorporating all the points raised above:

#!/usr/bin/env perl use strict; use warnings; use autodie; my $log_search_re; { open my $keys_fh, '<', 'keys_to_find.txt'; $log_search_re = qr/(?x: (?<!\w) (?: @{[ join '|', map { chomp; "\Q$_" } <$keys_fh> ]} ) (?!\w) )/; } { open my $log_fh, '<', 'log_to_read.txt'; open my $out_fh, '>', 'log_lines_matched.txt'; /$log_search_re/ && print $out_fh $_ while <$log_fh>; }

Here's the output file after running that script:

$ cat log_lines_matched.txt 1 abc 2 def 3 ghi 5 def 6 ghi 9 abc def ghi 10 $@%

I've added quite a lot of documentation links throughout but feel free to ask if anything needs further explanation.

— Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1177473]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-04-18 11:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found