Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Compare 2 CSV files by two different column and printout

by nicopelle (Acolyte)
on Oct 01, 2013 at 09:31 UTC ( [id://1056457]=perlquestion: print w/replies, xml ) Need Help??

nicopelle has asked for the wisdom of the Perl Monks concerning the following question:

Hi to all.

I've got two files, file1 and file2. file1 is bigger than file2 and looks like this:

machine;subsystem;name;..;..;..;..;..;..;..; ............. aagdbp01.mydomain.it;PatrolAgent_3181;Patrol Agent;AIX;PRODUZIONE;AGEN +ZIE-;AIX 5-3;PowerPC_POWER6;UNKNOWN;UNKNOWN;UNKNOWN;PRODUZIONE-ND aagdbp01.mydomain.it;QP1GAGA1;QM WMQ;AIX;PRODUZIONE;AGENZIE;AIX 5-3;Po +werPC_POWER6;;AGENZIE;UNKNOWN;PRODUZIONE aagdbp01.mydomain.it;asampsp;Novell IM Agent;AIX;PRODUZIONE;AGENZIE-;A +IX 5-3;PowerPC_POWER6;UNKNOWN;UNKNOWN;UNKNOWN;PRODUZIONE-ND aagdbp01.mydomain.it;gsionline;Web Server;AIX;PRODUZIONE;AGENZIE-;AIX +5-3;PowerPC_POWER6;UNKNOWN;UNKNOWN;UNKNOWN;PRODUZIONE-ND .....

and so on.
(More or less 6k lines of data CSV.) The second file, file2, is smaller (more or less 1k lines), and looks like this:

hostname:priority:classname:string1:string2 ..... aagdbp01:01:SVC.OPCON:opcon aagdbp01:35:GEN.QMSAG:mqm:QP1GA aagdbp01:36:AGENZIA.PICOOCL:picoOCL .....

and so on.
I would matching two files (compare file2 with file1) with the criteria below described:
read every line of file2 and for every lines, IF (string1 OR string2) =~ m/subsytem/ AND hostname =~ m/machine/ then SKIP to next line. The final result would be "file1's lines" substracted by "file2's lines" that would respect the above criteria.
Thanks to anyone who wants to spend some time with my silly problem, NicK.

Replies are listed 'Best First'.
Re: Compare 2 CSV files by two different column and printout
by Laurent_R (Canon) on Oct 01, 2013 at 11:07 UTC

    Looking at the first line of each of your file, your machine has the form "aagdbp01.mydomain.it" and your hostname the form "aagdbp01". It seems that hostname =~ m/machine/ is never going to match. So your are done with your work, the output file should simply be file1.

    Or perhaps you want something like this: $machine =~ m/$hostname/. BTW, the full truth is that hostname =~ m/machine/ will actually never be executed, because (besides the defective syntax)  (string1 OR string2) =~ m/subsytem/ will probably always be false for basically the same reason (if I can make sense of your data).

      Of course your right. My mistake reporting the matching. The right case is:
      $machine =~ m/$hostname/
Re: Compare 2 CSV files by two different column and printout
by ww (Archbishop) on Oct 01, 2013 at 10:35 UTC

    Have you expended any effort other than posting your request? What have you tried? Where's your code and how does it fail to satisfy your intent?

    Lacking the above, we're not much inclined to provide specific help -- AKA, do your work for you. See On asking for help and -- at PerlMonks FAQ -- among other helpful explanations of the local value system.

    But to get you started, this is a classic case where the Perl solution will closely parallel what you'd do if there were no computer and you had to do this job with two pieces of paper and a stubby pencil. So try writing down an exact outline of the steps you'd need to follow. Then think about unique (which, in Perl, tends to equate to 'think hashes'...and then, in Tutorials study up on hashes.

      Thanks. Please remove my post, if it's possible. Regards, Nicola.

        Please don't. It is considered better to leave (even "bad" or "embarrassing") questions around to help future monks and pilgrims on this site.

        --MidLifeXis

Re: Compare 2 CSV files by two different column and printout
by Anonymous Monk on Oct 01, 2013 at 09:59 UTC
      Thanks. I would to see if anyone would help me with some code lines :-)

        What code have you already written and where do you have problems?

Re: Compare 2 CSV files by two different column and printout
by sundialsvc4 (Abbot) on Oct 01, 2013 at 14:19 UTC

    If you are dealing with six thousand lines, of about this size, and know that it will never become larger, then you can probably be quite brute-force about your approach and it will work just fine.   Read the lines from file #1 into an array, then use, say, the grep() function to search for a regular expression such as ^$hostname\: ... thus looking for “starting at the beginning of the line, look for $hostname followed by a colon-character.”   Use split() to split-up the lines as needed.   There are several equally-good ways to do this.

    Also consider the possibility of using Unix command-line tools such as grep, egrep, sort, merge, and especially diff.   Sometimes, you discover that you don’t have to “write a program” at all!

    While we don’t offer a program-writing service or homework-service, it certainly is possible to “try sincerely to write it” and then to ask us for specific help.

Re: Compare 2 CSV files by two different column and printout
by Marshall (Canon) on Oct 02, 2013 at 09:15 UTC
    #!usr/bin/perl -w use strict; my $file1 = 'aagdbp01.mydomain.it;PatrolAgent_3181;Patrol Agent;AIX;PRODUZIONE;AGE +NZIE-;AIX 5-3;PowerPC_POWER6;UNKNOWN;UNKNOWN;UNKNOWN;PRODUZIONE-ND aagdbp01.mydomain.it;QP1GAGA1;QM WMQ;AIX;PRODUZIONE;AGENZIE;AIX 5-3;Po +werPC_POWER6;;AGENZIE;UNKNOWN;PRODUZIONE aagdbp01.mydomain.it;asampsp;Novell IM Agent;AIX;PRODUZIONE;AGENZIE-;A +IX 5-3;PowerPC_POWER6;UNKNOWN;UNKNOWN;UNKNOWN;PRODUZIONE-ND aagbp01.mydomain.it;gsionline;Web Server;AIX;PRODUZIONE;AGENZIE-;AIX 5 +-3;PowerPC_POWER6;UNKNOWN;UNKNOWN;UNKNOWN;PRODUZIONE-ND '; my $file2 = 'aagdbp01:01:SVC.OPCON:opcon aagdbp01:35:GEN.QMSAG:mqm:QP1GA aagdbp01:36:AGENZIA.PICOOCL:picoOCL '; open (FILE2, '<', \$file2) or die "Can't open file2 - $!\n"; print "File2 search terms are:\n"; while (<FILE2>) { chomp; my ($string1, $string2) = (split /:/)[-2,-1]; printf "%-20s %-10s\n", $string1, $string2; } =prints..... File2 search terms are: SVC.OPCON opcon mqm QP1GA AGENZIA.PICOOCL picoOCL =cut
    Question:
    For example line2 of FILE1 has QP1GAGA1, should that match QP1GA?
    Please explain very clearly which lines should be deleted from
    FILE1 and why

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1056457]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-19 04:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found