Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Output problems

by shingster08 (Initiate)
on Dec 07, 2016 at 14:51 UTC ( [id://1177387]=perlquestion: print w/replies, xml ) Need Help??

shingster08 has asked for the wisdom of the Perl Monks concerning the following question:

So i moved to the Ubuntu operating system recently to see how it was like and i came across the syslog which proves to be quite interesting. So i decided to create a script that will run through the syslog and search for key terms with a regex and output this to another text file. this is what i got so far

#!/usr/bin/perl use strict; use warnings; my @array=(); open(my $keyword,'<', "keyword.txt") or die "Couldn't open file file.t +xt, $!"; open(my $sys,'<', "syslog") or die "Couldn't open file file.txt, $!"; #open($keyword,'>' "keyword.txt") || die "Couldn't open file file.txt, + $!"; #open my $keyword, '>' , $file_location3 or die "can't open Keywords:" + $!; # gives keywords.txt the file handle keyword and shows + #error message if it fails to open #open my $sys, '>' , $file_location2 or die $!; # same as above open(my $fh, '>', 'output.txt'); #my $file_location2 = "syslog"; #my $file_location3 = "keyword.txt"; #arraylisy goes here my $Keyword_or = join '|' , map {chomp;qr/\Q$_\E/} <$keyword>; # lists + all lines in indicated file and joins the list in 1 string #regex +here removes new line from each line and uses literal regex #which +matches even those words with dots my $regex = qr|\b($Keyword_or)\b|; # this regex will match #all the keywords stored in keyword +s txt file #@array = $Keyword_or; foreach $regex(@array) { #while (/regex/g) #{ #print $NEW "$.: $1"; print $fh $regex; #} #return $keyword; #return $sys; #return $NEW; #print $fh $NEW; close $fh; }

While this code does compile with no errors it doesn't actually output anything and i also realized that my array is also empty. As im still new to perl Can anyone tell me how i would push the results of the regex to an arraylist and output it ?

Replies are listed 'Best First'.
Re: Output problems
by 1nickt (Canon) on Dec 07, 2016 at 15:06 UTC

    One of the reasons to use Perl is that other people have done a lot of the work for you (note I am not speaking about getting your script fixed here or on SO!)

    So don't reinvent the wheel: There are lots of modules for interacting with the syslog on CPAN

    You might like to start with Parse-Syslog-Line.

    Hope this helps


    The way forward always starts with a minimal test.
Re: Output problems
by SuicideJunkie (Vicar) on Dec 07, 2016 at 15:14 UTC

    The large amount of commented-out code may be hiding something... I see you have declared @array, but I can't spot where you populate it.

    It also seems to me that you probably don't want @array anyways, since that implies reading the entire file into memory instead of processing it a line at a time. Replacing @array with my $line = <$ifh> is usually the better approach.

Re: Output problems
by kcott (Archbishop) on Dec 08, 2016 at 03:38 UTC

    G'day shingster08,

    Welcome to the Monastery.

    [While it's fine to write different versions of statements — commenting and uncommenting them to see different effects (I do this myself on occasions) — please don't leave all the commented out lines in the code you post. It looks like over 50% of your code has been commented out: that's just a lot of noise for us to wade through.]

    As you say you're "new to perl", I'll attempt to step through the uncommented code and highlight what's good, bad and ugly. :-)

    Firstly, you've used strict and warnings. Excellent start! Use them in all your code: turn parts of them off, in limited scope, only when necessary.

    Next, you've used the 3-argument form of open with lexical filehandles: also very good - keep doing that. However, you've given the filehandles global scope: that's less good. Also, your naming could be improved: $keyword near the end of your code doesn't immediately make me think it's a filehandle (less so when it's an argument to return); $fh near the end does make me think it's a filehandle but which one (we're dealing with three files here).

    You've made some effort to check I/O by adding 'or die "..."' to your open statements. While it's important to check I/O, doing so this way is tedious and error-prone: as evidenced by the fact that you forgot it in one place and typed the wrong filename in the other two. Consider letting Perl do this for you with the autodie pragma.

    On to regex creation. Firstly, the general method you used to create the alternation is good. Two minor points: the '\E' is unnecessary because it's implicit at the end of the string (one possible use might be something like "\Q$quote_this\E$no_quoting_here"); the use of 'qr' here is questionable (perhaps useful if you wanted to make your keyword search case-insensitive: 'qr/(?i:\Q$_)/').

    Your use of the '\b' assertion is highly problematic in this context. It matches a boundary between word (\w) and non-word (\W) characters. You're quoting the keywords because they might contain non-word characters. The two of these won't play nicely together! In one of your comments you state: "matches even those words with dots"; a more accurate comment would be: "somethimes matches words with dots depending on where the dots are". A couple of examples to demonstrate this:

    $ perl -we 'q{xxx.xxx} =~ /\b(xxx\.xxx)\b/; print "|$1|\n"' |xxx.xxx| $ perl -we 'q{.xxx} =~ /\b(\.xxx)\b/; print "|$1|\n"' Use of uninitialized value $1 in concatenation (.) or string at -e lin +e 1. ||

    This is explained in more detail in "perlrebackslash: Assertions". A better option might be to replace the first '\b' with a negative lookbehind assertion and the second '\b' with a negative lookahead assertion. See "perlre: Lookaround Assertions" for details. Rewriting the two previous examples:

    $ perl -we 'q{xxx.xxx} =~ /(?<!\w)(xxx\.xxx)(?!\w)/; print "|$1|\n"' |xxx.xxx| $ perl -we 'q{.xxx} =~ /(?<!\w)(\.xxx)(?!\w)/; print "|$1|\n"' |.xxx|

    Finally, we get to the actual processing. You appear to have become totally lost here so, beyond saying that pretty much everything here is wrong, I won't comment further. All you need is a while loop to iterate your input file; printing those lines that match your regex to your output file.

    I created dummy log and key files:

    $ cat log_to_read.txt 1 abc 2 def 3 ghi 4 jkl 5 def 6 ghi 7 abd 8 abcdefghi 9 abc def ghi 10 $@% 11 \$\@\% $ cat keys_to_find.txt abc def ghi $@%

    I then wrote this short script, incorporating all the points raised above:

    #!/usr/bin/env perl use strict; use warnings; use autodie; my $log_search_re; { open my $keys_fh, '<', 'keys_to_find.txt'; $log_search_re = qr/(?x: (?<!\w) (?: @{[ join '|', map { chomp; "\Q$_" } <$keys_fh> ]} ) (?!\w) )/; } { open my $log_fh, '<', 'log_to_read.txt'; open my $out_fh, '>', 'log_lines_matched.txt'; /$log_search_re/ && print $out_fh $_ while <$log_fh>; }

    Here's the output file after running that script:

    $ cat log_lines_matched.txt 1 abc 2 def 3 ghi 5 def 6 ghi 9 abc def ghi 10 $@%

    I've added quite a lot of documentation links throughout but feel free to ask if anything needs further explanation.

    — Ken

Re: Output problems
by stevieb (Canon) on Dec 07, 2016 at 14:56 UTC

    Welcome to the Monastery, shingster08!

    When you cross-post a question, please inform the audiences of all locations that you've done so, to prevent wasted duplicate efforts.

    You could start by implementing the recommendations by Sobrique in the Stack Overflow post. This is what I mean by wasted duplicate efforts. We're going to say many of the same things he did over there, so please fix those things up, and if you still have issues, we can go from there.

      sorry im still new to posting questions so i'll take care of that now

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1177387]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-28 16:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found