Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

correct angle on tackling a problem

by diamondsandperls (Beadle)
on Aug 24, 2012 at 17:41 UTC ( #989596=perlquestion: print w/ replies, xml ) Need Help??
diamondsandperls has asked for the wisdom of the Perl Monks concerning the following question:

I have a csv file that has several columns. I created the csv using perl but need to add code to it to remove duplicate lines and when creating it have it only print once on either source field or destination field me being allow to select one or the other.

So basically if option 1 will be source field only print line if the line hasnt printed. I will never know the ip addresses in this field bc the data will print on the fly as it does right now creating the csv. Currently to create the csv i am not using any modules just comma delimiting my text.

Below is the sample code

my @textfiles = <*.txt *.log>; my $input_file; my $input_fh; my $src; my $dst; my $output_file = "simon.csv"; open(my $output_fh, '>', $output_file) or die "Failed to open $output_file - $!"; print {$output_fh} "uploadfiles,submitter,description,SIP,DIP, +Date_occurred_detected,Time_occurred_detected,Report_Severity,Inciden +t_Type_Details\n"; close $output_fh; foreach my $textfile (@textfiles) { if ($textfile =~ /(\d+.\d+.\d+.\d+)/) { my $ipaddy = $textfile =~ /(\d+.\d+.\d+.\d+)/; print "Processing $textfile\n"; open(my $input_fh, '<', $textfile) or die "Failed to open $textfile: $!"; open($output_fh, '>>', $output_file) or die "Failed to open $output_file - $!"; while (my $line = <$input_fh>) { if ($line =~ /\d{4}-\d+-\d+\s\d{2}:\d{2}:\d{2}\s\d+\s\d+.\d+.\ +d+.\d+/) { $src = $line =~ /\d{4}-\d+-\d+\s\d{2}:\d{2}:\d{2}\s\d+\s(\ +d+.\d+.\d+.\d+)/; print {$output_fh} "$1.zip,$newcontent,Malicious activity +found when mining proxylog data,$1,"; $dst = $line =~ /SG-HTTP-Service (\d+.\d+.\d+.\d+)/g; print {$output_fh} "$1,$now_Month\/$now_Day\/$now_Year,$no +w_Hour:$now_Min $am_pm2,3,24\n"; } } } }

Comment on correct angle on tackling a problem
Download Code
Re: correct angle on tackling a problem
by cheekuperl (Monk) on Aug 24, 2012 at 17:52 UTC
    Add a flag that tells you whether the line has printed or not. Check this flag before printing it next time :)
      I am not sure how to tie this into my current code could you provide an example your help is very much appreciated. I am assuming I regex to the source address assign this variable a scalar value such as $flag. I am not sure how to check whether the source or dst address has already printed. It will be one or the other as well most of the time source

      thanks in advanced

        foo() unless $seen{$key}++;

        --MidLifeXis

Re: correct angle on tackling a problem
by MidLifeXis (Prior) on Aug 24, 2012 at 18:39 UTC

    Be aware that

    $dst = $line =~ /SG-HTTP-Service (\d+.\d+.\d+.\d+)/g;
    does not do what you think it does. You need to escape the '.' if you want it to be literal in a regexp. If you do not, the '.' means 'any character'. This exists in multiple locations in your script.

    --MidLifeXis

      thanks for the heads up on my regex never thought of it like that but what you say does make sense bc . does many any character.

      Not sure how to apply the logic foo() unless $seen{$key}++; to the current code?

        diamondsandperls:

        He means logic like:

        my %seen; for my $key (0 1 2 1 2 3 2 3 4 3 4 5) { print $key unless $seen{$key}++; }

        This (untested) script should just print the values 0 through 5 in order. The $key value is just the data you use to tell whether it's identical or not. When you first check the %seen hash for the value 0, there won't be an entry for zero, so the print will execute. The line also increments the value in the %seen hash to 1, so the next time you see 0 it *won't* print.

        You just need the %seen hash, the data you use for the $key, and an action you want to control.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://989596]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-09-20 20:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (161 votes), past polls