Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Parsing out uniques

by diamondsandperls (Beadle)
on Aug 24, 2012 at 23:04 UTC ( #989655=perlquestion: print w/ replies, xml ) Need Help??
diamondsandperls has asked for the wisdom of the Perl Monks concerning the following question:

My unless statement toward the bottom of the code simply is not working any help would be appreciated.

My goal is to only process a given src address once and not again I do not want to print this line again.

use strict; use warnings; use Cwd; use LWP::UserAgent; use DateTime; use File::Slurp; print "Type the malicious IP: "; my $IP = <>; chomp $IP; #calculating times and dates my $dt_now = DateTime->now; $dt_now->subtract( hours => 5 ); my $now_Hour = sprintf("%02d",$dt_now->hour_12()); my $now_Year = $dt_now->year(); my $now_Month = sprintf("%02d",$dt_now->month()); my $now_Day = sprintf("%02d",$dt_now->day()); my $now_Min = sprintf("%02d",$dt_now->minute()); my $am_pm = $dt_now->am_or_pm(); my $oldSSO = qx{whoami}; chomp $oldSSO; my ($sso) = $oldSSO =~ /.*(\w{2}\d{5})/; my $ua = new LWP::UserAgent; my $response = $ua->get("http://referencedatasite/$sso"); my $content = $response->content; my ($newcontent) = $content =~ /<geid>(\d+)/; my @textfiles = <*.txt *.log>; my $input_file; my $input_fh; my $src; my $dst; my @srcs; my %seen; my $output_file = "simon.csv"; open(my $output_fh, '>', $output_file) or die "Failed to open $output_file - $!"; print {$output_fh} "uploadfiles,submitter,description,SIP,DIP, +Date_occurred_detected,Time_occurred_detected,Report_Severity,Inciden +t_Type_Details\n"; close $output_fh; foreach my $textfile (@textfiles) { if ($textfile =~ /(\d+.\d+.\d+.\d+)/) { my ($ipaddy) = $textfile =~ /(\d+.\d+.\d+.\d+)/; print "Processing $textfile\n"; my @lines = read_file( $textfile ) ; open($output_fh, '>>', $output_file) or die "Failed to open $output_file - $!"; foreach my $line (@lines) { ($src) = $line =~ /\d{4}-\d+-\d+\s\d{2}:\d{2}:\d{2}\s\d+\s(\d+ +.\d+.\d+.\d+)/; %seen = (); unless ($seen{$src}++) { if ($line =~ $IP) { print {$output_fh} "$src.zip,$newcontent,Malicious + activity found when mining proxylog data,$src,"; ($dst) = $line =~ /SG-HTTP-Service (\d+.\d+.\d+.\d ++)/g; print {$output_fh} "$dst,$now_Month/$now_Day/$now_ +Year,$now_Hour:$now_Min $am_pm,3,24\n"; } } } } }

Comment on Parsing out uniques
Download Code
Re: Parsing out uniques
by GrandFather (Cardinal) on Aug 25, 2012 at 00:21 UTC

    As a general thing declare variables where they are first needed so their scope is clear. In your code you declare a bunch of variables outside the outer for loop, almost none of which are used except in the innermost for loop.

    Your problem however is with the one variable that does need to be global to the outer for loop. Even though declared there, you reset it immediately before you test it! The immediate fix is to simply remove %seen = ();.

    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://989655]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-10-31 11:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (216 votes), past polls