Beefy Boxes and Bandwidth Generously Provided by pair Networks Ovid
XP is just a number
 
PerlMonks  

Perl: How to perfectly match specific data between two files and do comparison?

by WWq (Novice)
on Jul 23, 2013 at 14:46 UTC ( #1045885=perlquestion: print w/ replies, xml ) Need Help??
WWq has asked for the wisdom of the Perl Monks concerning the following question:

I have two files (File A & File B) in same format as below. I would like to match certain pattern of data from both files and do matching. My coding below used very long time to generate result. Other than that, It is wrong somewhere which cause incomplete extraction. Any alternative methods or improvement?

I extracted each line name and score from both files and stored them in two output files. Each output file contains extracted name and score. At first, if score in File A is negative value, do ignore the specific line extraction. Else if score in File A is positive value, match name of File A with File B. There will be three conditions and three result reports generated (pass.rpt, fail.rpt and noCheck.rpt).

For those matched names, it will proceed to compare. If File A score > 50 and File B score > 40, print matched name, score from File A (score A) and score from File B (score B) to pass.rpt and pass_counter($pc) plus one for each comparison. Else if <50 and <40, print matched name, score A and score B to fail.rpt and fail_counter($fc) plus one.

Last condition is for those negative values from File A. If names from both files matched, print name, scoreA and score B to noCheck.rpt and noCheck_counter plus one.

File A

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Report : students A

-science

-math

-language

Date : Fri Jul 19 17:00:31 2013

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Name science math lang. score

--------------------------------------------------------------------------

Jane_let [0] (sa) 58.78 r 66.15 0.00 -33

Alfert_pipe (sa) 74.72 r 92.72 0.00 82

Olive_pipe 8 (sa) 64.28 f 25.40 0.00 58

mass/excel/i60 86.21 r 59.90 0.00 68

Anne_let (sa) 51.98 f 12.69 0.00 -39

yuki/099/pipe 76.52 r 94.32 0.00 -82

frey/let/sa/y589 47.79 f 99.00 0.00 78

alan/excel/sa/y589 97.00 f 96.00 0.00 -70

..

..

File B

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Report : students B

-science

-math

-language.

Date : Fri Jul 19 17:00:31 2013

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Name science math lang. score

--------------------------------------------------------------------------

Ash_let 9 (sa) 58.78 r 66.15 0.00 33

Alfert_pipe (sa) 74.72 r 92.72 0.00 57

Olive_pipe 8 (sa) 64.28 f 25.40 0.00 20

mass/excel/i60 86.21 r 59.90 0.00 16

Sam_let (sa) 51.98 f 12.69 0.00 -39

yuki/099/pipe 76.52 r 94.32 0.00 82

frey/let/sa/y589 47.79 f 99.00 0.00 30

alan/excel/sa/y589 67.00 f 96.00 0.00 -90

..

..

coding: use Getopt::Long qw(:config no_ignore_case); use Data::Dumper; use POSIX qw(floor); use strict; use warning; my $orig = ''; my $new = ''; GetOptions('orig=s' => \$orig, 'new=s' => \$new); if (!$orig|!$new) { print "\n\t Help: test.pl -orig <file A> -new <file B>\n"; exit; } open (PASS, ">pass.rpt") || die "ERROR: cannot open"; open (FAIL, ">fail.rpt") || die "ERROR: cannot open"; open (NC, ">noCheck.rpt") || die "ERROR: cannot open"; open (t1, ">t1") || die "ERROR: cannot open"; open (t2, ">t2") || die "ERROR: cannot open"; my (@array, $line, $end1, $slack1, $b1, $THIS, @arr1, @arr2, @tmp1 +, @tmp2, @emp, @emp2, $data1, $data2,$emp1,$emp2,$emp3,$emp4,$ep1,$s1 +,$ep2,$s2,$ncc,$pc,$fc); $ncc = 0; $pc = 0; $fc = 0; fileA_ext(); fileB_ext(); check(); #_________________________________________________________________ +______________________________ sub fileA_ext() { if ($orig =~ /\S+\.gz$/) { open (FileA,"gunzip -c $orig |") || die "ERROR: can't read $ori +g\n"; } else { open (FileA,"$orig") || die "ERROR: can't read $orig\n"; } while (@array = <FileA>) { foreach $line(@array) { if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+( +.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) { if ($line !~ m/\((sa)\)/) { @arr1 = @emp; next if ($line =~ m/Name/); $name1 = "$1"; $score1 = "$12"; my $data1 = join(";",$name1,$score1); push (@arr1, $data1); } if ($line =~ m/\((sa)\)/) { @arr1 = @emp2; @tmp1 = @emp; next if ($line =~ m/Name/); push (@tmp1, $line); #print t "@tmp1\n"; foreach $line (@tmp1) { if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+ +(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) { my $name2 = "$1"; substr($name2, -13) = ''; my $score2 = "$12"; my $data1 = join(";",$name2,$score2); push (@arr1, $data1); $name2 = $score2 =""; #print "@arr1\n\n"; } } } print t1 "@arr1\n\n"; } } } close (FileA); } #_________________________________________________________________ +___________________________ sub FileB_ext() { if ($new =~ /\S+\.gz$/) { open (FileB,"gunzip -c $new |") || die "ERROR: $THIS can't read + $new\n"; } else { open (FileB,"$new") || die "ERROR: $THIS can't read $new\n"; } while (@array = <FileB>) { foreach $line(@array) { if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*) +\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) { #print "$line\n"; if ($line !~ m/\((sa)\)/) { @arr2 = @emp; next if ($line =~ m/Name/); my $name3 = "$1"; my $score3 = "$12"; my $data2 = join(";",$name3,$score3); push (@arr2, $data2); } if ($line =~ m/\((sa)\)/) { @arr2 = @emp2; @tmp2 = @emp; next if ($line =~ m/Name/); push (@tmp2, $line); #print t "@tmp2\n"; foreach $line (@tmp2) { if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+ +(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) { my $name4 = "$1"; substr($name4, -13) = ''; my $score4 = "$12"; my $data2 = join(";",$name4,$score4); push (@arr2, $data2); $name4 = $score4 =""; #print "@arr2\n\n"; } } } print t2 "@arr2\n\n"; } } } close (FileB); } sub check() { foreach $data1 (@arr1) { if ($data1 ne ""){ if ($data1 =~ m/(.*)\;(.*)/) { $ep1 = $emp1; $s1 = $emp2; my $ep1 = "$1"; my $s1 = "$2"; #print r "$ep1 $s1\n\n"; foreach $data2 (@arr2) { if ($data2 ne "") { if ($data2 =~ m/(.*)\;(.*)/) { $ep2 = $emp3; $s2 = $emp4; my $ep2 = "$1"; my $s2 = "$2"; #print R "$ep2 $s2\n"; if ( $ep1 eq $ep2 && $s1 =~ m/-/g) { $ncc++; #print NC "Total match: $ncc\n\n"; print NC "$ep1 $s1 $s2\n"; } if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 > 50 && $ +s2 > 40) { $pc++; print PASS "$ep1 $s1 $s2\n"; } if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 < 50 && $ +s2 < 40) { $fc++; print FAIL "$ep1 $s1 $s2\n"; } } } } } } } print NC "\nTotal match: $ncc\n\n"; print PASS "\nTotal match: $pc\n\n"; print FAIL "\nTotal match: $fc\n\n"; }

expected result:

pass.rpt

---------------

Name scoreA scoreB

Alfert_pipe (sa) 82 57

fail.rpt

--------------

Olive_pipe 8 (sa) 58 20

mass/excel/i60 68 16

frey/let/sa/y589 78 30

noCheck.rpt

-------------

yuki/099/pipe -82 82

alan/excel/sa/y589 -70 -90

Comment on Perl: How to perfectly match specific data between two files and do comparison?
Download Code
Re: Perl: How to perfectly match specific data between two files and do comparison?
by Anonymous Monk on Jul 23, 2013 at 16:35 UTC

    This looks wrong:

    if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+( +.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) { if ($line !~ m/\((sa)\)/) { @arr1 = @emp; next if ($line =~ m/Name/); $name1 = "$1"; $score1 = "$12";

    $12(not "$12") must refer to a previous regex.

Re: Perl: How to perfectly match specific data between two files and do comparison?
by poj (Curate) on Jul 23, 2013 at 18:39 UTC
    What is result if File A score is 50 and File B score is 40 ?. Assuming that anything that isn't NC or PASS is FAIL then your code can be reduced to something like this.
    #!perl use strict; open (PASS, ">pass.rpt") || die "ERROR: cannot open"; open (FAIL, ">fail.rpt") || die "ERROR: cannot open"; open (NC, ">noCheck.rpt") || die "ERROR: cannot open"; # for test my @PASS=(); my @FAIL=(); my @NC=(); my ($ncc,$pc,$fc); # test data if required my @arr1=();# = ('A;-50','B;51','C;51','D;55'); my @arr2=();# = ('A;51' ,'B;39','C;41','F;57'); file_ext('fileA.txt',\@arr1); file_ext('fileB.txt',\@arr2); check(); print "\nTotal PASS match: $pc\n\n"; print "$_\n" for @PASS; print "\nTotal FAIL match: $fc\n\n"; print "$_\n" for @FAIL; print "\nTotal NC match: $ncc\n\n"; print "$_\n" for @NC; sub file_ext { my ($file,$ar) = @_; open (FILE,'<',$file) or die "Could not open $file"; while (<FILE>){ chomp; # name = first field, score = last field my @f = split /\s+/; #print "$f[0] $f[-1]\n"; push @$ar,"$f[0];$f[-1]"; } } sub check { foreach my $data1 (@arr1) { my ($ep1,$s1) = split ';',$data1; foreach my $data2 (@arr2) { my ($ep2,$s2) = split ';',$data2; next if ($ep1 ne $ep2); # negatives in fileA score if ($s1 =~ /-/){ $ncc++; my $line = sprintf "%-20s %4d %4d",$ep1,$s1,$s2; print NC $line."\n"; push @NC,$line; } # pass elsif (($s1 > 50) && ($s2 >40)){ $pc++; my $line = sprintf "%-20s %4d %4d",$ep1,$s1,$s2; print PASS $line."\n"; push @PASS,$line; } #fail else { $fc++; my $line = sprintf "%-20s %4d %4d",$ep1,$s1,$s2; print FAIL $line."\n"; push @FAIL,$line; } } } } =head1 fileA.txt Jane_let [0] (sa) 58.78 r 66.15 0.00 -33 Alfert_pipe (sa) 74.72 r 92.72 0.00 82 Olive_pipe[8] (sa) 64.28 f 25.40 0.00 58 mass/excel/i60 86.21 r 59.90 0.00 68 Anne_let (sa) 51.98 f 12.69 0.00 -39 yuki/099/pipe 76.52 r 94.32 0.00 -82 frey/let/sa/y589 47.79 f 99.00 0.00 78 alan/excel/sa/y589 97.00 f 96.00 0.00 -70 =head1 fileB.txt Ash_let[9] (sa) 58.78 r 66.15 0.00 33 Alfert_pipe (sa) 74.72 r 92.72 0.00 57 Olive_pipe[8] (sa) 64.28 f 25.40 0.00 20 mass/excel/i60 86.21 r 59.90 0.00 16 Sam_let (sa) 51.98 f 12.69 0.00 -39 yuki/099/pipe 76.52 r 94.32 0.00 82 frey/let/sa/y589 47.79 f 99.00 0.00 30 alan/excel/sa/y589 67.00 f 96.00 0.00 -90 =cut
    If the run time is too slow then consider a hash based solution rather than using arrays.
    poj
Re: Perl: How to perfectly match specific data between two files and do comparison?
by Laurent_R (Priest) on Jul 23, 2013 at 22:18 UTC
    Although I do not have enough energy right now to go through your code in enough details, I would tend to think that using a hash is probably much better than using an array.
      Yes hash is a better way. Thanks Laurent_R. =)
Re: Perl: How to perfectly match specific data between two files and do comparison?
by jwkrahn (Monsignor) on Jul 24, 2013 at 02:16 UTC
Re: Perl: How to perfectly match specific data between two files and do comparison?
by Loops (Hermit) on Jul 24, 2013 at 03:08 UTC
    #!/usr/bin/perl use strict; use warnings; sub readReport { my $filename = shift; my %hash; open (my $file, '<', $filename) or die $!; while (<$file>) { next unless /^--------/ .. eof; # Bypass header if (my ($name, $score) = /^\s*(\S+).+\s(\S+)\s*$/) { $hash{$name} = $score; } } return \%hash; } my $reportA = readReport('filea'); my $reportB = readReport('fileb'); my %reports = ( fail => [], ignored => [], noCheck => [], pass => [] ) +; for my $name (sort keys $reportA) { my @scores = ($reportA->{$name}, $reportB->{$name}); my $rpt = "fail"; if ($scores[0] < 0) { $rpt = defined $scores[1] ? 'noCheck' : 'ignored'; $scores[1] //= 0; } elsif ($scores[0] > 50 and $scores[1] > 40) { $rpt = 'pass'; } push $reports{$rpt}, "$name, @scores"; } for my $name (keys %reports) { print "\nReport $name\n----------------------\n" ; print "$_\n" for @{$reports{$name}}; }
    Output:
    Report pass ---------------------- Alfert_pipe, 82 57 Report noCheck ---------------------- alan/excel/sa/y589, -70 -90 yuki/099/pipe, -82 82 Report fail ---------------------- Olive_pipe, 58 20 frey/let/sa/y589, 78 30 mass/excel/i60, 68 16 Report ignored ---------------------- Anne_let, -39 0 Jane_let, -33 0
      Hi Loops, Thanks. What if I want to match name included brackets behind? Some of the names have brackets info such as like (sa)and (tan804k). Those bracket info are needed for name matching. How to match brackets correctly?
        You need to change the regex that pulls the name out of the text, to something like:
        /^\s*(\S+(?:.*\(.*\))?).+\s(\S+)\s*$/
        Okay, here's a slightly updated version that includes that regex, and a minor cleanup:
        use strict; use warnings; sub readReport { my $filename = shift; my %hash; open (my $file, '<', $filename) or die $!; while (<$file>) { next unless /^--------/ .. eof; # Bypass header if (my ($name, $score) = /^\s*(\S+(?:.*\(.*\))?).+\s(\S+)\s*$/ +) { $hash{$name} = $score; } } return \%hash; } my $reportA = readReport('filea'); my $reportB = readReport('fileb'); my %reports = ( fail => [], ignored => [], noCheck => [], pass => [] ) +; for my $name (sort keys $reportA) { my ($a,$b) = ($reportA->{$name}, $reportB->{$name}); if ($a < 0) { push $reports{'noCheck'}, "$name, $a $b" if defined $b; push $reports{'ignored'}, "$name, $a 0" unless defined $b; } elsif ($a > 50 and $b > 40) { push $reports{'pass'}, "$name, $a $b"; } else { push $reports{'fail'}, "$name, $a $b"; } } for my $name (keys %reports) { print "\nReport $name\n----------------------\n" ; print "$_\n" for @{$reports{$name}}; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045885]
Approved by Happy-the-monk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2014-04-21 05:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (490 votes), past polls