ImpalaSS has asked for the wisdom of the Perl Monks concerning the following question:
Hello,
I have a wierd question. I have 2 perl scripts, they do the exact same thing, except one creates a file called "newsite1.txt" and the other "newsite2.txt". And each one are run on alternating days. newsite1.pl runs on sunday, tuesday, thursday, saturday. What they each do, is read a directory, which has the names of all the cellsites in the nextel system. My boss wants me to create a tool that tells you which cellsite came up over night. So if someone clicks the button today (tuesday) they would get the sites that came up monday night through tuesday morning. Here is a little bit of what the text file looks like:
PHI_R10K_2
PHI_TDAP_3
de0040Newark
de0042Wilmington
de0053Christana
de0053Christiana
de0101Odessa
de0102Clayton
de0103Dover
de0180Claymont
de0184Wooddale
de0187Woodcreek
de0205ChestnutKnoll
de0267Chapman
de0314Glasgow
de0348Millside
de0371SouthDover
Only there happens to be about 700 of them. Now, my question is, how can i compare newsite1.txt to newsite2.txt and print the differences with most effenciency? i thought about opening the one file, and comparing every entry against the second file. But, if the second file has more cell names.. then some would get neglected.
Thanks in advance
Dipul
Re: Comparison Of Files
by chipmunk (Parson) on Dec 05, 2000 at 21:22 UTC
|
Here's a solution that reports the changes in both directions, using a hash for each file. It has been tested.
sub compare_sites {
my($old, $new) = @_;
if (-M $new > -M $old) {
($old, $new) = ($new, $old);
}
open(OLD, $old) or die "Can't open $old: $!\n";
open(NEW, $new) or die "Can't open $new: $!\n";
my(%old, %new);
while (<OLD>) {
chomp;
$old{$_} = 1;
}
while (<NEW>) {
chomp;
$new{$_} = 1;
}
close(OLD);
close(NEW);
my @old = keys %old;
delete @old{keys %new};
delete @new{@old};
for (sort keys %old) {
print "$_ went down overnight.\n";
}
for (sort keys %new) {
print "$_ came up overnight.\n";
}
}
| [reply] [d/l] |
|
sub compare_sites {
my($old, $new) = @_;
if (-M $new > -M $old) {
($old, $new) = ($new, $old);
}
open(OLD, $old) or die "Can't open $old: $!\n";
open(NEW, $new) or die "Can't open $new: $!\n";
my %hash;
my $omsk = 1;
my $nmsk = 2;
while (<OLD>) {
chomp;
$hash{$_} |= $omsk;
}
while (<NEW>) {
chomp;
$hash{$_} |= $nmsk;
}
close(OLD);
close(NEW);
for (sort keys %hash) {
if ($hash{$_} == 1) {
print "$_ went down overnight.\n";
}
}
for (sort keys %hash) {
if ($hash{$_} == 2) {
print "$_ went up overnight.\n";
}
}
}
compare_sites("newsite1.txt", "newsite2.txt");
Now you could just go through the hash once
and push into arrays as well, but then we might as well
use the chipmunk hash. (Dance or Dish?) | [reply] [d/l] |
Re: Comparison Of Files
by gaspodethewonderdog (Monk) on Dec 05, 2000 at 20:17 UTC
|
well this is untested code, but maybe something like this:
compare_sites("newsite1.txt", "newsite2.txt")
if day eq "monday..." # psuedo code
compare_sites("newsite2.txt", "newsite1.txt")
if day eq "tuesday..." # psuedo code
sub compare_sites {
my $file1 = shift;
my $file2 = shift;
open IN1, $file1;
while(<IN1>) {
chomp;
$newsite1{$_}++;
} # while
close IN1;
open IN2, $file2;
while(<IN2>) {
chomp;
print "$_ is a new site\n"
if not defined($newsite1{$_});
} # while
close IN2;
} # compare_sites
Hopefully memory isn't a problem... but otherwise this shouldn't be too bad a solutions
UPDATE:
changed the code to a function as per tye's suggestion (and so he doesn't think I'm trying to make him look like an idiot I added this comment)... and added psuedo code for calling the function based on dates... :P
UPDATE:
changed the does not exist line to print the site name and not a number... plus it identifies that it is a new site...
| [reply] [d/l] |
|
One thing. I would change this:
while(<IN2>) {
chomp;
print "$_ is a new site\n"
if not defined($newsite1{$_});
} # while
to this:while(<IN2>) {
chomp;
print "$_ is a new site\n" unless exists $newsite1{$_};
} # while
Checking for the existence of a key is quicker than looking up the value of a hash element. | [reply] [d/l] [select] |
|
How about something like this to choose the order of files:
if( -M 'newsite1.txt' > -M 'newsite2.txt' ) {
compare_sites('newsite1.txt', 'newsite2.txt')
} else {
compare_sites('newsite2.txt', 'newsite1.txt')
}
That way if for some reason the update doesn't run one day,
you'll still be comparing the files in proper order.
--
I'd like to be able to assign to an luser | [reply] [d/l] |
|
Hey, i would use that kind of solution, but each file gets updates on alternating days. So on one day, newsite1.txt might be the newest file, and the other newsite2.txt might. I need a way to just see, if one name is in the other, and if its not.. print it, so if newsite2.txt has a site that newsite1.txt does not, print it, and viceversa.
Thanks
Dipul
| [reply] |
|
| [reply] |
Re: Comparison Of Files
by mdillon (Priest) on Dec 05, 2000 at 21:42 UTC
|
#!/usr/bin/perl -w
use strict;
use Algorithm::Diff qw(traverse_sequences);
die unless @ARGV == 2 and $ARGV[0] ne $ARGV[1];
my @files = @ARGV;
my %data;
push @{$data{$ARGV}}, $_ while <>;
my $a = $data{$files[0]};
my $b = $data{$files[1]};
my (@additions, @deletions);
traverse_sequences $a, $b, {
DISCARD_A => sub { push @deletions, $a->[$_[0]] },
DISCARD_B => sub { push @additions, $b->[$_[0]] },
};
if (@deletions)
{
print "Deletions:", $/;
print for @deletions;
print $/;
}
if (@additions)
{
print "Additions:", $/;
print for @additions;
print $/;
}
| [reply] [d/l] |
(tye)Re3: Comparison Of Files
by tye (Sage) on Dec 05, 2000 at 21:54 UTC
|
My first idea was a "merge sort" (well, at least the "merge" part of it). Luckilly you say that the files are already sorted so this is easy. But I still don't think it is as easy as extending what gaspodethewonderdog came up with (since you said in the chatterbox that you wanted both additions and deletions).
sub compare {
my( $old, $new )= @_;
open OLD, "< $old" or die "Can't read $old: $!\n";
open NEW, "< $new" or die "Can't read $new: $!\n";
my %old;
while( <OLD> ) {
chomp;
$old{$_}++;
}
close OLD;
my @new;
while( <NEW> ) {
chomp;
push @new, $_ if delete $old{$_};
}
close NEW;
my @old= sort keys %old;
print "New sites:\n\t", join("\n\t",@new), $/;
print "Old sites:\n\t", join("\n\t",@old), $/;
}
Then use Alabanach's idea for deciding which order to compare the files in.
-
tye
(but my friends call me "Tye") | [reply] [d/l] |
Re: Comparison Of Files
by wardk (Deacon) on Dec 05, 2000 at 21:25 UTC
|
Perl Cookbook Chapter 4.7
"Finding Elements in one array but not the other"
is a solution that may help
from the cookbook... "Build a hash of the keys of @B and use
as a lookup table, then check each element in @A to see if it is @B."
compare site1 to site2 then site2 to site1 to
get both sets of missing sites.
| [reply] |
Re: Comparison Of Files
by decnartne (Beadle) on Dec 05, 2000 at 20:43 UTC
|
if you have no aversion to diff, how about:
#!/usr/bin/perl -w
use strict;
open(INP, "/usr/bin/diff ./newsite1.txt ./newsite2.txt |") or die "pip
+e: $!\n";
while (<INP>) {
print substr($_,2) if (/^> /);
print substr($_,2) if (/^< /);
}
close(INP);
decnartne ~ entranced | [reply] [d/l] |
|
Two problems. First, you may need to sort both files before you do this. If the order of entries might change between days, then "diff" isn't a great solution.
Second, you'll probably end up printing severals lines as being both added and deleted. "diff" isn't great at doing a set difference. It is looking for document edits and so can easily report a big chunk of the "bigger" file as being changed and then show the subset of that chunk that was already there in the "smaller file" (and didn't change).
-
tye
(but my friends call me "Tye")
| [reply] |
|
The 2 files are created from a directory listing of all the sites in the system, soi they will automatiicaly be sorted, in exactly the same order, however, new sites will be placed within that order. So the files could look like this:
newsite1:
PHI_R10K_2
PHI_TDAP_3
de0040Newark
de0042Wilmington
de0053Christana
de0053Christiana
de0101Odessa
de0102Clayton
de0103Dover
de0180Claymont
de0184Wooddale
de0187Woodcreek
de0205ChestnutKnoll
de0267Chapman
de0314Glasgow
de0348Millside
de0371SouthDover
newsite2:
PHI_R10K_2
PHI_TDAP_3
de0040Newark
de0042Wilmington
de0045Concord # <======= new site
de0053Christana
de0053Christiana
de0101Odessa
de0102Clayton
de0103Dover
de0180Claymont
de0184Wooddale
de0187Woodcreek
de0205ChestnutKnoll
de0267Chapman
de0314Glasgow
de0348Millside
de0371SouthDover
at which point i would want de0045concord returned.
I hope this helps.
Thanks again
Dipul | [reply] [d/l] [select] |
|
ouch! you're right... i did some further testing, and let's just say it's pretty ugly...
decnartne ~ entranced
| [reply] |
Re: Comparison Of Files
by gt8073a (Hermit) on Dec 06, 2000 at 05:12 UTC
|
My boss wants me to create a tool that tells you which cellsite came up over night. So if someone clicks the button today (tuesday) they would get the sites that came up monday night through tuesday morning.
use Time::Local;
my $sec = 0;
my $min = 0;
my $close = 17; ## 5:00 pm
my $open = 9; ## 9:00 am
my $spd = 24 * 60 * 60;
my $yes = time - timelocal( $sec, $min, $close, (localtime( tim
+e - $spd ))[ 3, 4, 5 ] );
my $morning = time - timelocal( $sec, $min, $open, (localtime)[ 3,
+4, 5 ] );
my $dir = '/'; ## where the files reside( rem ending slash )
my @cellsites;
opendir CELLSITES, $dir or die "opendir";
@cellsites =
map{ $_->[0] }
grep{ $_->[1] <= $yes && $_->[1] >= $morning }
map{ [ $_, ( ( -M "$dir$_" ) * $spd ) ]}
readdir CELLSITES;
closedir CELLSITES;
## print @cellsites to a file
## or to screen, or something
| [reply] [d/l] |
|
@cellsites =
grep { my $age= $spd * -M $dir.$_;
$age <= $yes && $morning <= $age
} readdir CELLSITES;
which I think is a slight improvement. Keep up the good work.
-
tye
(but my friends call me "Tye") | [reply] [d/l] |
Re: Comparison Of Files
by chipmunk (Parson) on Dec 06, 2000 at 08:51 UTC
|
If you decide to go with a utility solution rather than a Perl solution, an alternative to using `diff` is `comm`, which finds common lines in sorted files. Each line is put into one of three columns depending on whether it is in the first file, the second file, or both files. (There is no column for lines that are in neither file.) The command line arguments let you turn off columns you don't want.
comm -23 newsite1.txt newsite2.txt will print lines that are only in newsite1.txt, and
comm -13 newsite1.txt newsite2.txt will print lines that are only in newsite2.txt
comm -3 newsite1.txt newsite2.txt will print lines that in are only in newsite1.txt in the first column, and lines that are only in newsite2.txt in the second column.
| [reply] |
Re: Comparison Of Files
by extremely (Priest) on Dec 06, 2000 at 03:50 UTC
|
This is off from the main question but why run two different
scripts on odd days? Just run one script. Have it copy
the older file back and then create a new file.
use File::Copy;
move '/path/newsite2.txt', '/path/newsite3.txt'; #just in case
move '/path/newsite1.txt', '/path/newsite2.txt'; #Back up
#create newsite1.txt
Really, maintaining just one script is worth the extra
line or two...
--
$you = new YOU;
honk() if $you->love(perl) | [reply] [d/l] |
Re: Comparison Of Files
by 2501 (Pilgrim) on Dec 05, 2000 at 22:50 UTC
|
I like chipmunk's idea to use hashes.
read through both files basically saying:
$masterlist{$fileline}=1;
then do a foreach on the keys of masterlist to get all the records. Because hashes can't have dupe keys, you should be ok. From what you wrote of your attempts, it sounds like you are not as interested in REMOVING items,
because time will take care of that.
| [reply] |
Re: Comparison Of Files
by weingart (Acolyte) on Dec 05, 2000 at 23:31 UTC
|
If they are already sorted,
diff -u file1 file2 | sed -e '/^^+/d' -e 's/^+//'
will do the job handily... Of course, using an oldie
tool will do the job even easier, but only if the files
are sorted first:
comm -2 file1 file2
--Toby.
| [reply] |
Re: Comparison Of Files
by Anonymous Monk on Dec 05, 2000 at 23:46 UTC
|
With 700 lines you can easily read one file into a hash and
then compare each row in the other file against that hash.
With larger files (1 MB and up), you may wish to save
a lot of memory by noticing, that the files are sorted,
alphabetically, it seems. Also, in this case most of the
lines will be present in both files, so storing the
differing rows will not consume insanious amounts of memory :-)
Here is an mergesortish way to do it:
=head1 compare_sorted_files_by_line($filename1, $filename2)
Finds lines that are present in only one of the files, whose names are
given as arguments. This function assumes that the lines in the files are
in alphabetical order.
Returns the unique rows in each file, in two list references. The first one
points to an array containing the rows that are present in $filename1 only,
and the second one similarly for $filename2.
Returns an empty list if either of the files could not be opened for reading.
=cut
sub compare_sorted_files_by_line( $$ )
{
my($filename1, $filename2) = @_;
my(@in1only, @in2only); # The unique rows ("matches") are stored in these
unless(open(FILE1, "< $filename1"))
{ warn "$0: Could not open $filename1: $!\n"; return (); }
unless(open(FILE2, "< $filename2"))
{ warn "$0: Could not open $filename2: $!\n"; close FILE1; return ();}
my $line1 = <FILE1>;
my $line2 = <FILE2>;
while(defined($line1) and defined($line2))
{
my $compare = $line1 cmp $line2;
if($compare == 0)
{
$line1 = <FILE1>;
$line2 = <FILE2>;
next;
}
elsif($compare > 0)
{
push(@in2only, $line2);
$line2 = <FILE2>;
next;
}
else
{
push(@in1only, $line1);
$line1 = <FILE1>;
}
}
# were there differences at end of file?
if(defined($line1))
{
push(@in1only, $line1);
push(@in1only, $_) while(<FILE1>);
}
if(defined($line2))
{
push(@in2only, $line2);
push(@in2only, $_) while(<FILE2>);
}
close FILE1;
close FILE2;
# we happen to like strings without newlines.
chomp(@in1only);
chomp(@in2only);
return(\@in1only, \@in2only);
}
-Bass
| [reply] |
Re: Comparison Of Files
by Anonymous Monk on Dec 06, 2000 at 03:02 UTC
|
Use the diff command outputing the results to a file. | [reply] |
Re: Comparison Of Files
by belg4mit (Prior) on Dec 07, 2000 at 02:45 UTC
|
| [reply] |
|
|