Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW


by perlmonknoob (Initiate)
on Jan 24, 2014 at 12:27 UTC ( #1071936=perlquestion: print w/replies, xml ) Need Help??
perlmonknoob has asked for the wisdom of the Perl Monks concerning the following question:


Replies are listed 'Best First'.
Re: Arrays and grep problems
by 2teez (Vicar) on Jan 24, 2014 at 12:45 UTC

    Hi perlmonknoob
    "..Any help would be greatly appreciated"
    What about the ones given by kennethk in Re: Help from the Perliest monks which you have not followed at all in your present codes? Also use perltidy or you properly layout your codes.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Arrays and grep problems
by Kenosis (Priest) on Jan 24, 2014 at 19:43 UTC

    You've been given excellent tips for improving your script. In an earlier response to you, Laurent_R suggested two methods for accomplishing your task. The following addresses one of those, as I think it's a good option.

    When you're looking for keywords in each line, you're effectively joining them with OR. For example: print the current file's line if it contains 'Hello' OR 'world' OR 'today'. In a regex, the OR function is accomplished using alternation:


    The (?:) notation forms a non-capturing group. It's not strictly necessary here, but is used to just cluster the set of disjuncts.

    There are a few issues with the above regex which need to be addressed. One is that, as it is, it's case sensitive. That is, 'today' would match but 'Today' wouldn't. To create a case-insensitive match, use the i modifier:


    The next issue is that the above regex would match 'worldly'--and you may not want in-string matching. To prevent in-string matching, you need to match words enclosed by word boundaries:


    The last item to consider comes when you may want to match a phrase like 'Mr. Smith'. The problem is that the period in the string is a regex meta-character used to match one character. This period, and other meta-characters, must be escaped for a literal match: 'Mr\. Smith'.

    Give the above considerations, the script below implements them:

    use strict; use warnings; use autodie; my @arr; my $logz = '/var/log/syslog'; my $file = '/home/user/Desktop/keywords'; open my $keysFH, '<', $file; while (<$keysFH>) { chomp; push @arr, "\\b\Q$_\E\\b"; } close $keysFH; my $words = '(?:' . ( join '|', @arr ) . ')'; my $regex = qr/$words/i; open my $logFH, '<', $logz; while (<$logFH>) { print if /$regex/; } close $logFH;

    The "\\b\Q$_\E\\b" notation first quotes all meta-characters in the word (\Q...\E) and the \\b means word boundary. There are two "\", because the first escapes the second, leaving the literal "\b" in the string.

    Print both $words and $regex to see what's constructed.

    Usage: perl [>outFile]

    The last, optional parameter directs output to a file.

    Hope this helps!

Re: Arrays and grep problems
by hippo (Abbot) on Jan 24, 2014 at 13:11 UTC
Re: Arrays and grep problems
by toolic (Bishop) on Jan 24, 2014 at 13:22 UTC
Re: Arrays and grep problems
by Random_Walk (Prior) on Jan 24, 2014 at 15:49 UTC

    Untested code, but I hope it will give you a hint or two...

    use strict; # Please use warnings; # Please # $n = 0; # There may be a better way # 'my' added to localise the scope of these variables # It won't matter much now, but when you get to large programs its a l +ife saver my $logz = '/var/log/syslog'; my $file = '/home/user/Desktop/keywords'; # Nowadays we use the three parameter form of open. # its good to check the results too. # open(LOGFILE, $logz); # open(KEYWORD, $file); open my $log, '<', $logz or die "Can't read $logz: $!\n"; open my $keys,'<', $file or die "Can't read $file: $!\n"; # I guess you want to check the keys against each logline # here I am going to take a different approach # Dont read the log all in one go, we will just check it line # at a time # @keyWord = <$keys>; # @logFile = <LOGFILE>; # precompile the regex for the keys my %re_key; # regex compiled keys while (my $pattern = <$keys> ) { # check each key chomp $pattern; # remove line ending $re_key{$pattern} = qr/$pattern/; # store in a hash } # $keyz = @keyWord[$n]; # this was only looking at one key. while (my $line = <$log> ) { for my $pattern (keys %re_key) { print "$pattern matched in $log" if $line =~ /$re_key{$pattern +}/; } }


    Pereant, qui ante nos nostra dixerunt!
Re: Arrays and grep problems
by hippo (Abbot) on Jan 25, 2014 at 10:20 UTC

    You have changed the code in your original post. Please don't do that as it renders some of the previous answers non-sensical and prevents readers of the whole thread in future from understanding it. Have a read of How do I change/delete my post? particularly the section headed It is uncool to update a node in a way that renders replies confusing or meaningless.

    Your newer code works as it stands (albeit that one might have written it somewhat differently). If it is printing every line from the log, that will be because every line in the log matches. You can confirm this yourself using the system grep:

    grep -f /home/user/Desktop/keywords /var/log/syslog

    If you have a colourising system grep this will make it very clear where the matches are occurring.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1071936]
Front-paged by Corion
[choroba]: :-)

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2018-02-21 14:09 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (281 votes). Check out past polls.