Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Parsing out uniques

by diamondsandperls (Beadle)
on Aug 24, 2012 at 23:04 UTC ( [id://989655]=perlquestion: print w/replies, xml ) Need Help??

diamondsandperls has asked for the wisdom of the Perl Monks concerning the following question:

My unless statement toward the bottom of the code simply is not working any help would be appreciated.

My goal is to only process a given src address once and not again I do not want to print this line again.

use strict; use warnings; use Cwd; use LWP::UserAgent; use DateTime; use File::Slurp; print "Type the malicious IP: "; my $IP = <>; chomp $IP; #calculating times and dates my $dt_now = DateTime->now; $dt_now->subtract( hours => 5 ); my $now_Hour = sprintf("%02d",$dt_now->hour_12()); my $now_Year = $dt_now->year(); my $now_Month = sprintf("%02d",$dt_now->month()); my $now_Day = sprintf("%02d",$dt_now->day()); my $now_Min = sprintf("%02d",$dt_now->minute()); my $am_pm = $dt_now->am_or_pm(); my $oldSSO = qx{whoami}; chomp $oldSSO; my ($sso) = $oldSSO =~ /.*(\w{2}\d{5})/; my $ua = new LWP::UserAgent; my $response = $ua->get("http://referencedatasite/$sso"); my $content = $response->content; my ($newcontent) = $content =~ /<geid>(\d+)/; my @textfiles = <*.txt *.log>; my $input_file; my $input_fh; my $src; my $dst; my @srcs; my %seen; my $output_file = "simon.csv"; open(my $output_fh, '>', $output_file) or die "Failed to open $output_file - $!"; print {$output_fh} "uploadfiles,submitter,description,SIP,DIP, +Date_occurred_detected,Time_occurred_detected,Report_Severity,Inciden +t_Type_Details\n"; close $output_fh; foreach my $textfile (@textfiles) { if ($textfile =~ /(\d+.\d+.\d+.\d+)/) { my ($ipaddy) = $textfile =~ /(\d+.\d+.\d+.\d+)/; print "Processing $textfile\n"; my @lines = read_file( $textfile ) ; open($output_fh, '>>', $output_file) or die "Failed to open $output_file - $!"; foreach my $line (@lines) { ($src) = $line =~ /\d{4}-\d+-\d+\s\d{2}:\d{2}:\d{2}\s\d+\s(\d+ +.\d+.\d+.\d+)/; %seen = (); unless ($seen{$src}++) { if ($line =~ $IP) { print {$output_fh} "$src.zip,$newcontent,Malicious + activity found when mining proxylog data,$src,"; ($dst) = $line =~ /SG-HTTP-Service (\d+.\d+.\d+.\d ++)/g; print {$output_fh} "$dst,$now_Month/$now_Day/$now_ +Year,$now_Hour:$now_Min $am_pm,3,24\n"; } } } } }

Replies are listed 'Best First'.
Re: Parsing out uniques
by GrandFather (Saint) on Aug 25, 2012 at 00:21 UTC

    As a general thing declare variables where they are first needed so their scope is clear. In your code you declare a bunch of variables outside the outer for loop, almost none of which are used except in the innermost for loop.

    Your problem however is with the one variable that does need to be global to the outer for loop. Even though declared there, you reset it immediately before you test it! The immediate fix is to simply remove %seen = ();.

    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://989655]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-03-28 23:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found