Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Comparing two files

by Ronnie (Scribe)
on Oct 04, 2006 at 12:06 UTC ( #576305=perlquestion: print w/replies, xml ) Need Help??
Ronnie has asked for the wisdom of the Perl Monks concerning the following question:

I've been tasked with a rather large software update task and have hit a road-bump at the first hurdle which is baffling me - not hard to do and I've had a very bad week! I want to compare 2 files which contain among things PC numbers. I want to compare the new list with the old and extract the names of the new PC's that have been added since the last run. Simple I hear you say, well I thought so. The following code tells me every PC is new - working for a Council I KNOW that's simply not true!
#!/usr/bin/perl -w # use strict ; # # # Scalars #---------# my $count_in = 0 ; my $count_new = 0 ; my $file_new = 'PC_LIST_20060919.txt' ; my $file_old = 'Prev_carefirst_PCs.txt' ; my $PC_NO = undef ; # # Arrays #--------# my @fields = () ; # # Boolean #---------# my $FOUND = 0 ; # # Processing #------------# print "\n\t\t\t\tTest Starts\n" ; open IN1, "<$file_new" or die "\n\tCanny open $file_new :: $!\n" ; while (<IN1>) { chomp ; $count_in ++ ; @fields = split /,/, $_ ; $PC_NO = $fields[1] ; $FOUND = 0 ; open IN2, "<$file_old" or die "\n\tCanny open $file_old :: $!\n" ; while (<IN2>) { chomp ; if (/($PC_NO)/) { print "\n\tIt matched!!!\n" ; $FOUND = 1 ; } } close IN2 or die "\n\tCan't close $file_old :: $!\n" ; if (! $FOUND) { print "\n\tPC $PC_NO is new!" ; $count_new ++ ; } } close IN1 or die "\n\tCan't close $file_new :: $!\n" ; print "\n\tThere were $count_in PC's checked!" ; print "\n\tThere were $count_new new PC's found!" ; print "\n\t\t\t\tTest Ends\n" ;

The files being read were a csv file saved from an Excel spreadsheet (The new list) and a txt file saved from an access table (The old list).
The NEW list looks like
Priority Kirkgate, PC7588 Priority Kirkgate, PC7598 Priority Kirkgate, PC8590 Priority.CD Site's, PC8648 Priority.CD Site's, PC8756 Priority.CD Site's, PC9020 Priority.CD Site's, PC9028 Priority.CD Site's, PC9093

The old list looks like
"B5206","128.1.36.205","000C765D5E68","CN=PC7607.OU=Workstations.O=abe +rdeen.T=ABERDEEN" "PC10018","128.1.37.13","0011090675AA","CN=PC10018.OU=Workstations.O=a +berdeen.T=ABERDEEN" "PC10152","128.1.40.30","000C76EFD05F","CN=PC10152.OU=Workstations.O=a +berdeen.T=ABERDEEN" "PC10171","128.1.13.57","000C76EFD027","CN=PC10171.OU=Workstations.O=a +berdeen.T=ABERDEEN" "PC10266","128.1.38.195","0011092581AB","CN=PC10266.OU=Workstations.O= +aberdeen.T=ABERDEEN" "PC10335","128.1.34.213","0011092582CE","CN=PC10335.OU=Workstations.O= +aberdeen.T=AB

The strange thing is that I thought it was my logic that was wrong but if I create a couple of dummy new & old files with just one entry in each it works fine! Any ideas what I'm failing to understand here?

2006-10-04 Retitled by Corion, as per Monastery guidelines
Original title: 'Baffled'

Replies are listed 'Best First'.
Re: Comparing two files
by Corion (Pope) on Oct 04, 2006 at 12:20 UTC

    You haven't printed out the strings you're trying to compare:

    $_ = "Priority Kirkgate, PC7588\n"; @fields = split /,/, $_ ; print "PC string from NEW list: >>$fields[1]<<";

    This code will have your PC code with a leading space, and the rest of your code won't match that.

    Also, it's not really efficient to read through the $old_file for every comparison. I'd read new computers into a hash and then check while stepping through $old_file whether the computer is known or not:

    use strict; my %new_pc; ... while (<IN1>) { chomp; $count_in++; @fields = split /,/, $_; $PC_NO = $fields[1]; $PC_NO =~ s/\s*//g; $new_pc{ $PC_NO } = 1; }; ... while (<IN2>) { chomp; if (/CN=(PC\d+)\.OU/) { my $pc = $1; if (exists $new_pc{ $pc }) { # $pc is old and new, hence has survived delete $new_pc{ $pc }; } else { # $pc is old but not new, hence has been removed }; } else { warn "Ignoring line >>$_<<"; }; }; # All PCs in %new are now really new: for my $pc (keys %new) { print "$pc is new\n"; };
Re: Comparing two files
by jdporter (Canon) on Oct 04, 2006 at 14:38 UTC

    I don't know exactly where your problem is; but if I had to do that, I might write it like this, using a module to parse the CSV:

    use IO::File; my %old = map { $_->[1] => $_->[0] } map { chomp; [ split /,\s*/ ] } IO::File->new("<$file_old")->getlines; use Tie::Handle::CSV; my %new = map { ( $_->{'CN'} => $_ ) } map { { map { /(.*)=(.*)/ } split /\.(?=[A-Z]+=)/, $_->[3] } } Tie::Handle::CSV->new($file_new,header=>0)->getlines; # now %old and %new are keyed by PC ID. # you can use the standard techniques for finding keys that # are in one but not the other, e.g. my @new_PCs = grep { not exists $old{$_} } sort keys %new;
    We're building the house of the future together.
Re: Comparing two files
by dorko (Parson) on Oct 04, 2006 at 15:43 UTC
    If you can get the list of old PC numbers into an array, and then get the list of new PC numbers into a second array, you could then use List::Compare to find the newly added computers.

    Cheers,

    Brent

    -- Yeah, I'm a Delt.
Re: Comparing two files
by lyklev (Pilgrim) on Oct 04, 2006 at 20:08 UTC

    Your program has a couple of problems:

    First, you are splitting on comma's, then the pc no (the first field) has index 0, index 1 gets you the ip-address.

    Next, you are reading the entire second file for each line found in the first file, that will give you some serious I/O for big files.

    Perl is not very good at finding out whether something is in a list. If you want to do that, think of a hash in stead. Lists are for processing bulk data element by element. Keeping this in mind, let's rewrite the program.

    Assuming you have two files, pcs_old.csv and pcs_new.csv, starting with the opening stuff:

    use strict; use warnings; open (OLD, "<pcs_old.csv"); open (NEW, "<pcs_new.csv"); my %old; # where pc numbers will get stored...
    Read the old file line by line...
    while (<OLD>) { chomp; my @fields = split /,/; my $pc = $fields[0]; # get the first field $pc =~ s/"//g; # get rid of the quotes $old {$pc} = 1; # make an entry in the hash }
    Now do the same for the new pc file, but instead of storing them, see if an entry exists in the database with old pc's:
    while (<OLD>) { chomp; my @fields = split /,/; my $pc = $fields[0]; # get the first field $pc =~ s/"//g; # get rid of the quotes if (! exists $old {$pc}) { # the new pc file has a pc which did not exist # in the old pc file print "new PC found: $pc\n"; } } # and cleanup close (OLD); close (NEW);

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://576305]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2018-12-15 00:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many stories does it take before you've heard them all?







    Results (69 votes). Check out past polls.

    Notices?