http://www.perlmonks.org?node_id=950855

mlebel has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I can't seem to figure out what's wrong with my code. I am trying to figure out how get a list of matching lines from a given file, matched from another file.

Here is what I am talking about:

I have a list of ip accounting information from a router that looks like this (TmpIPFile):

Source Destination Packets Bytes 15.254.32.120 10.2.9.2 5 504 79.15.122.235 208.43.3.154 21 2092 79.15.122.235 63.245.217.113 21 2232 79.15.122.235 209.15.236.80 10 1310 79.15.122.235 46.37.179.218 34 4065 63.97.127.34 10.2.9.2 4 471 79.15.122.235 63.141.200.24 19 1811 79.15.122.235 72.251.219.10 437 56713 79.15.122.235 96.7.122.206 215 23318 79.15.122.235 209.200.154.225 77 6257 79.15.122.235 64.94.107.23 13 3436 79.15.122.235 64.74.126.22 23 1527 17.149.36.162 10.2.9.3 14 3416 79.15.122.235 184.25.187.120 49 5772 79.15.122.235 205.251.242.166 21963 32615009 79.15.122.235 12.239.198.71 26 2946 79.15.122.235 184.85.247.120 145 18458 79.15.122.235 184.235.49.15 10 2001 79.15.122.235 207.171.163.162 19 1393 79.31.21.75 10.2.9.2 11 3993 209.68.19.130 10.2.9.2 33 15941 79.15.122.235 64.94.107.16 4 1375 79.15.122.235 207.67.0.233 29 3742 72.247.242.235 10.2.9.2 7 3750 79.15.122.235 64.145.92.232 9 2364 79.15.122.235 208.88.180.89 28 4490 79.15.122.235 94.100.188.227 10 1979 17.149.36.15 10.2.9.3 14 3404 79.15.122.235 128.175.60.118 280 15120 65.54.81.34 10.2.9.2 42 23068 79.15.122.235 209.236.72.16 102 9765 79.15.122.235 65.55.33.50 18 5479 79.15.122.235 17.149.36.197 54 7279 67.148.147.64 10.2.9.2 553 274036 79.15.122.235 204.245.63.99 42 9826 79.15.122.235 207.46.206.74 104 10498 67.148.147.65 10.2.9.3 57 39207 17.172.232.80 10.2.9.3 27 6768 79.15.122.235 217.212.238.134 320 43460 8.8.8.8 10.2.9.6 84 8006 79.15.122.235 74.125.226.176 214 46874 79.15.122.235 23.12.158.224 299 30331 79.15.122.235 68.67.159.207 48 13422 79.15.122.235 208.122.28.12 33 2857 Accounting data age is 0w1d Box#

From this list, I want to be able to pick out specific lines from a list that looks this (TmpLookingForIPFile):

10.2.9.2 10.2.9.3 10.2.9.4 10.2.9.5

And with this list I would like to add up the Bytes Field for each IP and pass that value to another sub for each ip's.

The code that I have thus far for doing the work looks like this:

#!/usr/bin/perl use warnings; use strict; my $FALSE = 0; my $TRUE = 1; my $Flag = $TRUE; my $number_list = "TmpIPFile"; my $looking_for = "TmpLookingForIPFile"; my $DestDevice = "Box"; my %remember; if ($Flag == $TRUE){ open my $NUMBER_LIST, '<', $number_list or die "$number_list: $!"; while (<$NUMBER_LIST>) { next if /^sh|^\s*Source|^$DestDevice|^Accounting|^^M|^$|/; # +Skip text and empty lines my ($key, $value,) = split; push @{ $remember{$key} }, $value; } close $NUMBER_LIST; open my $LOOKING_FOR, '<', $looking_for or die "$looking_for: $!"; while (<$LOOKING_FOR>) { chomp; for my $value (@{ $remember{$_} }) { print "$_ $value\n"; # Do your calculations here... } } close $LOOKING_FOR; }
Please note that I have a ^^M in there, and that is a control character that is found in the original file. I am not 100% sure , but i beleive that chomp; in the previous line would take care of this.

Right now, I get nothing on the screen when i run this. I tried all kinds of prints and I cannot figure out what I am doing wrong..

Does anyone know what I can do to make this work?

Please note that this code runs inside another script, hence the $Flag portion of it.

Thanks in advance!

P.S. If you don't fully understand, please ask me questions :-)

Replies are listed 'Best First'.
Re: Generating a List of numbers
by InfiniteSilence (Curate) on Jan 31, 2012 at 02:50 UTC

    For starters that regex you are using fails. You need two things a) you must learn how to use the Perl debugger and to step through your code. Writing reams of Perl code without the ability to deduce precisely where things are breaking is the path to misery and b) test your regexes:

    DB<5> p $_ 15.254.32.120 10.2.9.2 5 504 DB<6> if (m/^[\.d\s]+/g){print 1} DB<7> if (m/^[\.d\s\t]+/g){print 1} 1
    The line you are looking for has nothing but numbers, spaces, tabs, and periods. My test checks only for that and succeeds. Yours is testing is trying to eliminate lines. That, in itself, is not the reason why it is failing but from the sheer number of alternates you added to the regex it is sloppy. Use regexes to zero in on what you want in a file and less on what you are trying to ignore.

    Celebrate Intellectual Diversity

Re: Generating a List of numbers
by GrandFather (Saint) on Jan 31, 2012 at 02:48 UTC

    The first show stopper is the trailing | in your regex in the first while loop. That causes all the lines in the log file to be skipped.

    After that, it seems to me that you have the key and value for your remember hash the wrong way around in the push.

    True laziness is hard work
Re: Generating a List of numbers
by JavaFan (Canon) on Jan 31, 2012 at 02:45 UTC
    I would do something like (untested):
    use autodie; my %byte_count; open my $fh1, "<", "TmpLookingForIPFile"; while (<$fh1>) { chomp; $byte_count{$_} = 0; } open my $fh2, "<", "TmpIPFile"; while (<$fh2>) { next if /^sh|^\s*Source|^$DestDevice|^Accounting|^^M|^$/; # Do NO +T end with |/ here, as it then will always match chomp; my ($dest, $bytes) = (split)[1, 3]; $byte_count{$dest} += $bytes if exists $byte_count{$dest}; } while (my ($ip, $bytes) = each %byte_count) { my_sub($ip, $bytes); } sub my_sub { ... something that uses the ip and the byte count }
Re: Generating a List of numbers
by mlebel (Hermit) on Jan 31, 2012 at 03:29 UTC

    thank you everyone, i will test this out and report back tomorrow

Re: Generating a List of numbers
by mlebel (Hermit) on Feb 01, 2012 at 03:07 UTC

    OK so I tried as many examples as I could. The one worked but i couldnt' quite do what i needed with it.

    Looks like I am right back to square 1. So here is the test scenario that I made. I just need to know what I am doing wrong with my code.(i don't need a "better" way of doing it, unless a good justification can be justified)

    this is the file "ip accounting file":

    Source Destination Packets Bytes 15.254.32.120 10.2.9.2 5 504 63.97.127.34 10.2.9.2 4 471 17.149.36.162 10.2.9.3 14 3416 79.31.21.75 10.2.9.2 11 3993 209.68.19.130 10.2.9.2 33 15941 72.247.242.235 10.2.9.2 7 3750 17.149.36.15 10.2.9.3 14 3404 65.54.81.34 10.2.9.2 42 23068 67.148.147.64 10.2.9.2 553 274036 67.148.147.65 10.2.9.3 57 39207 8.8.8.8 10.2.9.6 84 8006 Accounting data age is 0w1d Box#

    here is the file "include ip file" :

    10.2.9.2 10.2.9.3 10.2.9.4 10.2.9.5 10.2.9.6

    my code is this:

    #!/usr/bin/perl -w open TMPINCLUDEIPFILE, "<", "$tmpincludeipfile"; open TMPIPACCOUNTINGFILE, "<", "$tmpipaccountingfile"; foreach $Line (<TMPINCLUDEIPFILE>) { print "Line = $Line\n"; foreach $Line1 (<TMPIPACCOUNTINGFILE>) { print "Line1 = $Line1\n"; } } exit;

    and this code gives me the following output:

    ./testfile.pl Line = 10.2.9.2 Line1 = Source Destination Packets By +tes Line1 = 15.254.32.120 10.2.9.2 5 +504 Line1 = 63.97.127.34 10.2.9.2 4 + 471 Line1 = 17.149.36.162 10.2.9.3 14 + 3416 Line1 = 79.31.21.75 10.2.9.2 11 +3993 Line1 = 209.68.19.130 10.2.9.2 33 + 15941 Line1 = 72.247.242.235 10.2.9.2 7 + 3750 Line1 = 17.149.36.15 10.2.9.3 14 + 3404 Line1 = 65.54.81.34 10.2.9.2 42 230 +68 Line1 = 67.148.147.64 10.2.9.2 553 + 274036 Line1 = 67.148.147.65 10.2.9.3 57 + 39207 Line1 = 8.8.8.8 10.2.9.6 84 80 +06 Line1 = Line1 = Accounting data age is 0w1d Line1 = Box# Line = 10.2.9.3 Line = 10.2.9.4 Line = 10.2.9.5 Line = 10.2.9.6

    The output I expected to see (looking for) is:

    Line = 10.2.9.2 Line1 = Source Destination Packets By +tes Line1 = 15.254.32.120 10.2.9.2 5 +504 Line1 = 63.97.127.34 10.2.9.2 4 + 471 Line1 = 17.149.36.162 10.2.9.3 14 + 3416 Line1 = 79.31.21.75 10.2.9.2 11 +3993 Line1 = 209.68.19.130 10.2.9.2 33 + 15941 Line1 = 72.247.242.235 10.2.9.2 7 + 3750 Line1 = 17.149.36.15 10.2.9.3 14 + 3404 Line1 = 65.54.81.34 10.2.9.2 42 230 +68 Line1 = 67.148.147.64 10.2.9.2 553 + 274036 Line1 = 67.148.147.65 10.2.9.3 57 + 39207 Line1 = 8.8.8.8 10.2.9.6 84 80 +06 Line1 = Line1 = Accounting data age is 0w1d Line1 = Box# Line = 10.2.9.3 Line1 = Source Destination Packets By +tes Line1 = 15.254.32.120 10.2.9.2 5 +504 Line1 = 63.97.127.34 10.2.9.2 4 + 471 Line1 = 17.149.36.162 10.2.9.3 14 + 3416 Line1 = 79.31.21.75 10.2.9.2 11 +3993 Line1 = 209.68.19.130 10.2.9.2 33 + 15941 Line1 = 72.247.242.235 10.2.9.2 7 + 3750 Line1 = 17.149.36.15 10.2.9.3 14 + 3404 Line1 = 65.54.81.34 10.2.9.2 42 230 +68 Line1 = 67.148.147.64 10.2.9.2 553 + 274036 Line1 = 67.148.147.65 10.2.9.3 57 + 39207 Line1 = 8.8.8.8 10.2.9.6 84 80 +06 Line1 = Line1 = Accounting data age is 0w1d Line1 = Box# Line = 10.2.9.4 Line1 = Source Destination Packets By +tes Line1 = 15.254.32.120 10.2.9.2 5 +504 Line1 = 63.97.127.34 10.2.9.2 4 + 471 Line1 = 17.149.36.162 10.2.9.3 14 + 3416 Line1 = 79.31.21.75 10.2.9.2 11 +3993 Line1 = 209.68.19.130 10.2.9.2 33 + 15941 Line1 = 72.247.242.235 10.2.9.2 7 + 3750 Line1 = 17.149.36.15 10.2.9.3 14 + 3404 Line1 = 65.54.81.34 10.2.9.2 42 230 +68 Line1 = 67.148.147.64 10.2.9.2 553 + 274036 Line1 = 67.148.147.65 10.2.9.3 57 + 39207 Line1 = 8.8.8.8 10.2.9.6 84 80 +06 Line1 = Line1 = Accounting data age is 0w1d Line1 = Box# Line = 10.2.9.5 Line1 = Source Destination Packets By +tes Line1 = 15.254.32.120 10.2.9.2 5 +504 Line1 = 63.97.127.34 10.2.9.2 4 + 471 Line1 = 17.149.36.162 10.2.9.3 14 + 3416 Line1 = 79.31.21.75 10.2.9.2 11 +3993 Line1 = 209.68.19.130 10.2.9.2 33 + 15941 Line1 = 72.247.242.235 10.2.9.2 7 + 3750 Line1 = 17.149.36.15 10.2.9.3 14 + 3404 Line1 = 65.54.81.34 10.2.9.2 42 230 +68 Line1 = 67.148.147.64 10.2.9.2 553 + 274036 Line1 = 67.148.147.65 10.2.9.3 57 + 39207 Line1 = 8.8.8.8 10.2.9.6 84 80 +06 Line1 = Line1 = Accounting data age is 0w1d Line1 = Box# Line = 10.2.9.6 Line1 = Source Destination Packets By +tes Line1 = 15.254.32.120 10.2.9.2 5 +504 Line1 = 63.97.127.34 10.2.9.2 4 + 471 Line1 = 17.149.36.162 10.2.9.3 14 + 3416 Line1 = 79.31.21.75 10.2.9.2 11 +3993 Line1 = 209.68.19.130 10.2.9.2 33 + 15941 Line1 = 72.247.242.235 10.2.9.2 7 + 3750 Line1 = 17.149.36.15 10.2.9.3 14 + 3404 Line1 = 65.54.81.34 10.2.9.2 42 230 +68 Line1 = 67.148.147.64 10.2.9.2 553 + 274036 Line1 = 67.148.147.65 10.2.9.3 57 + 39207 Line1 = 8.8.8.8 10.2.9.6 84 80 +06 Line1 = Line1 = Accounting data age is 0w1d Line1 = Box#
    Is there something wrong with my code? if so, what? thanks in advance! Marc

      What you have shown (ignoring the bugs - there are some) is a solution to a different problem than the one you initially asked for help with! To avoid wasting our time helping you solve a problem that doesn't actually help you, what are you really trying to do?

      The short answer to "if so what?" is that your inner loop runs to the end of its file so the second time through the outer loop there is nothing left for the inner loop to loop over. However the code you show is bad in too many ways for me to bother with until I know I'm not wasting my time solving non-problems aside from saying "don't nest loops that read from files - it's almost always wrong".

      True laziness is hard work

        OK.I'm not advanced enough yet to determine if the solution is for a "different" problem just by looking at the code

        The history on this is that I tried getting help with this code(code in my last post) on another thread that fell through the cracks.The code that I initially posted here was the code that I had gotten from that thread. After getting help on here getting it "going", It quickly became apparent that it didn't do what i needed it to do, so I went back to my original question, hoping that your more advanced skills can better help me.

        So what I am trying to achieve as demonstrated in my last post(hopefully correctly described) is that for each line within the first loop, I want to loop over the second loop but look for the information that's provided by the first loop upon each pass within the second loop (once this works, then Filtered by a "next if !($Line =~ m/\d*.\d*.\d*.\d*\s*$IpPassedFromTheFirstLoop/);". Weather it's an inner or outer or separate loops, How can I achieve that?

        That was my original question on the other thread that took the wrong bend along the way. I now stick with that question.

        Thanks,Marc

Re: Generating a List of numbers
by mlebel (Hermit) on Feb 03, 2012 at 00:33 UTC

    Bingo! that was it (Chomp). I can't beleive that I didn't think of that.

    I decided to go with your Hash array code, I played arround with it and it's making a bit more sense to me now.. But i will still go and learn it fully... The output that I am dealing can get pretty big and efficiency can't hurt.(Although this code only runs once a month)

    I can't thank you and GrandFather enough for not letting me down and helping me through to the end. I can now finally go and finish this script and put it to work.

    Now if one of you know how I could of asked for this kind of help with a shorter more streamlined and straight to the point post, I am happy to listen to feedback

    Other than this, thanks a million!

Re: Generating a List of numbers
by mlebel (Hermit) on Feb 02, 2012 at 02:59 UTC

    I read the post "I know what I mean. Why don't you?" so based on that I will re-write this again.(i'm really not sure how short i can keep it but i will try).

    I fixed a part of my problem. I needed to open the "tmpipaccountingfile" during the outer loop and close it right after the end of the second loop.

    I shrunk the "tmpipaccountingfile" to this:

    Source Destination Packets Bytes 15.254.32.120 10.2.9.2 5 504 209.68.19.130 10.2.9.2 33 15941 17.149.36.162 10.2.9.3 14 3416 17.149.36.15 10.2.9.3 14 3404 67.148.147.65 10.2.9.3 57 39207 8.8.8.8 10.2.9.6 84 8006 Accounting data age is 0w1d Box#

    And i put the rest of the data within the script which looks like this:

    #!/usr/bin/perl use warnings; use strict; my $tmpipaccountingfile = "tmpipaccountingfile"; my $DestDevice = "Box"; foreach my $OuterData (<DATA>) { print "OuterData = $OuterData"; open TMPIPACCOUNTINGFILE, "<", "$tmpipaccountingfile"; foreach my $Line (<TMPIPACCOUNTINGFILE>) { chomp $Line; next if ($Line =~ m/^sh/); next if ($Line =~ m/^\s*Source/); next if ($Line =~ m/^$DestDevice/); next if ($Line =~ m/^Accounting/); next if ($Line =~ m/^$/); next unless ($Line =~ m/^\s*\d*.\d*.\d*.\d*\s*$OuterData/); print "Line = $Line\n\n"; } close TMPIPACCOUNTINGFILE; } exit; __DATA__ 10.2.9.2 10.2.9.3 10.2.9.6

    Now this mostly works. The new problem that I am running into now, is that I don't get any output with the script like this. But if i replace the line "next unless ($Line =~ m/^\s*\d*.\d*.\d*.\d*\s*"$OuterData"/);" with "next unless ($Line =~ m/^\s*\d*.\d*.\d*.\d*\s*"10.2.9.6"/);" for example, I will get the following output:

    OuterData = 10.2.9.2 Line = 8.8.8.8 10.2.9.6 84 8006 OuterData = 10.2.9.3 Line = 8.8.8.8 10.2.9.6 84 8006 OuterData = 10.2.9.6 Line = 8.8.8.8 10.2.9.6 84 8006

    I beleive that this proves that it works but it's not what I am looking for. The output that I am expecting to see with the code that I provided is this:

    OuterData = 10.2.9.2 Line = 15.254.32.120 10.2.9.2 5 504 Line = 209.68.19.130 10.2.9.2 33 15941 OuterData = 10.2.9.3 Line = 17.149.36.162 10.2.9.3 14 3416 Line = 17.149.36.15 10.2.9.3 14 3404 Line = 17.149.36.15 10.2.9.3 14 3404 Line = 67.148.147.65 10.2.9.3 57 39207 OuterData = 10.2.9.6 Line = 8.8.8.8 10.2.9.6 84 8006

    I hope that this is clear because I have no clue how to simplify it any further.

    Aaron, I will be going to read up on hash array's since it seems that it's probably the way to achieve this from what I gather from all the posts. From what you said, it sounds like what I am looking for and if I need a hash to acheive this, how would that be written?

    The results of each ip groups found would ultimately be pushed to an array for some calculations to be done for each ip's individually. I say this incase it influences any code that you might provide

    Now, I must say that you guy's are pretty tough, but it's ok, I have thick skin and I am learning in the process to become a better programmer. so thank you.

    So, lastly, how do i fix this?

      You are not chomping $OuterData. Therefore, it contains a newline at the end which prevents it from matching in the regular expression.
      Update: Looping over the file several times makes your algorithm very slow if the file is big. Using a hash, you can avoid this problem: