comment on

I have two files (File A & File B) in same format as below. I would like to match certain pattern of data from both files and do matching. My coding below used very long time to generate result. Other than that, It is wrong somewhere which cause incomplete extraction. Any alternative methods or improvement?

I extracted each line name and score from both files and stored them in two output files. Each output file contains extracted name and score. At first, if score in File A is negative value, do ignore the specific line extraction. Else if score in File A is positive value, match name of File A with File B. There will be three conditions and three result reports generated (pass.rpt, fail.rpt and noCheck.rpt).

For those matched names, it will proceed to compare. If File A score > 50 and File B score > 40, print matched name, score from File A (score A) and score from File B (score B) to pass.rpt and pass_counter($pc) plus one for each comparison. Else if <50 and <40, print matched name, score A and score B to fail.rpt and fail_counter($fc) plus one.

Last condition is for those negative values from File A. If names from both files matched, print name, scoreA and score B to noCheck.rpt and noCheck_counter plus one.

File A

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Report : students A

-science

-math

-language

Date : Fri Jul 19 17:00:31 2013

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Name science math lang. score

--------------------------------------------------------------------------

Jane_let [0] (sa) 58.78 r 66.15 0.00 -33

Alfert_pipe (sa) 74.72 r 92.72 0.00 82

Olive_pipe 8 (sa) 64.28 f 25.40 0.00 58

mass/excel/i60 86.21 r 59.90 0.00 68

Anne_let (sa) 51.98 f 12.69 0.00 -39

yuki/099/pipe 76.52 r 94.32 0.00 -82

frey/let/sa/y589 47.79 f 99.00 0.00 78

alan/excel/sa/y589 97.00 f 96.00 0.00 -70

File B

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Report : students B

-science

-math

-language.

Date : Fri Jul 19 17:00:31 2013

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Name science math lang. score

--------------------------------------------------------------------------

Ash_let 9 (sa) 58.78 r 66.15 0.00 33

Alfert_pipe (sa) 74.72 r 92.72 0.00 57

Olive_pipe 8 (sa) 64.28 f 25.40 0.00 20

mass/excel/i60 86.21 r 59.90 0.00 16

Sam_let (sa) 51.98 f 12.69 0.00 -39

yuki/099/pipe 76.52 r 94.32 0.00 82

frey/let/sa/y589 47.79 f 99.00 0.00 30

alan/excel/sa/y589 67.00 f 96.00 0.00 -90

  
coding:  
  

    use Getopt::Long qw(:config no_ignore_case);
    use Data::Dumper;
    use POSIX qw(floor);
    use strict;
    use warning;
    
    my $orig = '';
    my $new = '';
    
    GetOptions('orig=s' => \$orig, 'new=s' => \$new);
    
    if (!$orig|!$new) {
            print "\n\t Help: test.pl -orig <file A> -new <file B>\n";
            exit;
    }
    
    open (PASS, ">pass.rpt") || die "ERROR: cannot open";
    open (FAIL, ">fail.rpt") || die "ERROR: cannot open";
    open (NC, ">noCheck.rpt") || die "ERROR: cannot open";
    open (t1, ">t1") || die "ERROR: cannot open";
    open (t2, ">t2") || die "ERROR: cannot open";
    
    my (@array, $line, $end1, $slack1, $b1, $THIS, @arr1, @arr2, @tmp1
+, @tmp2, @emp, @emp2, $data1, $data2,$emp1,$emp2,$emp3,$emp4,$ep1,$s1
+,$ep2,$s2,$ncc,$pc,$fc);
    
    $ncc = 0;
    $pc = 0;
    $fc = 0;
    
    fileA_ext();
    fileB_ext();
    check();
    
    #_________________________________________________________________
+______________________________ 
    sub fileA_ext() {
    
    if ($orig =~ /\S+\.gz$/) {
       open (FileA,"gunzip -c $orig |") || die "ERROR: can't read $ori
+g\n";
    } else {
       open (FileA,"$orig") || die "ERROR: can't read $orig\n";
    }
    
    while (@array = <FileA>)    {
    
    foreach $line(@array) {
    
            if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(
+.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
    
    
            if ($line !~ m/\((sa)\)/) {
    
                @arr1 = @emp;
                next if ($line =~ m/Name/);
                        $name1 = "$1";
                $score1 = "$12";
                    
                my $data1 = join(";",$name1,$score1);     
                push (@arr1, $data1);
                
                }
            
            if ($line =~ m/\((sa)\)/) {
    
                @arr1 = @emp2;
                @tmp1 = @emp;
                next if ($line =~ m/Name/);
                push (@tmp1, $line);
                #print t "@tmp1\n";
    
                foreach $line (@tmp1) {
                    
                    if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+
+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
    
                        my $name2 = "$1";
                        substr($name2, -13) = '';
                        my $score2 = "$12";
    
                        my $data1 = join(";",$name2,$score2);     
                        push (@arr1, $data1);
                        $name2 = $score2 ="";
                        #print "@arr1\n\n";
                                    }
                                }
                            }
    print t1 "@arr1\n\n";
    }        
    }
    }
    close (FileA);
    }
    
    #_________________________________________________________________
+___________________________
    
                           
    sub FileB_ext() {
    
    if ($new =~ /\S+\.gz$/) {
       open (FileB,"gunzip -c $new |") || die "ERROR: $THIS can't read
+ $new\n";
    } else {
       open (FileB,"$new") || die "ERROR: $THIS can't read $new\n";
    }
    
    while (@array = <FileB>)  {
    
    foreach $line(@array) {
    
         if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)
+\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
            #print "$line\n";
    
            if ($line !~ m/\((sa)\)/) {
    
                @arr2 = @emp;
                next if ($line =~ m/Name/);
                        my $name3 = "$1";
                my $score3 = "$12";
                    
                my $data2 = join(";",$name3,$score3);     
                push (@arr2, $data2);
                
                }
            
            if ($line =~ m/\((sa)\)/) {
    
                @arr2 = @emp2;
                @tmp2 = @emp;
                next if ($line =~ m/Name/);
                push (@tmp2, $line);
                #print t "@tmp2\n";
    
                foreach $line (@tmp2) {
                    
                    if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+
+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
    
                        my $name4 = "$1";
                        substr($name4, -13) = '';
                        my $score4 = "$12";
    
                        my $data2 = join(";",$name4,$score4);     
                        push (@arr2, $data2);
                        $name4 = $score4 ="";
                        #print "@arr2\n\n";
                                    }
                                }
                            }
    print t2 "@arr2\n\n";
    }         
    }
    }
    close (FileB);
    }
    
    
    sub check() {
    
    foreach $data1 (@arr1) {
        if ($data1 ne ""){
    
            if ($data1 =~ m/(.*)\;(.*)/) {
                $ep1 = $emp1;
                $s1 = $emp2;
                my $ep1 = "$1";
                my $s1 = "$2";
                #print r "$ep1  $s1\n\n";
        
            foreach $data2 (@arr2) {
                if ($data2 ne "") {
    
                    if ($data2 =~ m/(.*)\;(.*)/) {
                        $ep2 = $emp3;
                        $s2 = $emp4;
                        my $ep2 = "$1";
                        my $s2 = "$2";
                        #print R "$ep2 $s2\n";
                            
    
                    if ( $ep1 eq $ep2 && $s1 =~ m/-/g) {
                        
                        $ncc++;
                        #print NC "Total match: $ncc\n\n";
                        print NC "$ep1    $s1    $s2\n";
                                        }
    
                    if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 > 50 && $
+s2 > 40) {
    
                        $pc++;
                        print PASS "$ep1    $s1    $s2\n";
                                            }
    
                    if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 < 50 && $
+s2 < 40) {
    
                        $fc++;
                        print FAIL "$ep1    $s1    $s2\n";
                                            }
    
    
    }
    }
    }
    }
    }
    }
    print NC "\nTotal match: $ncc\n\n";
    print PASS "\nTotal match: $pc\n\n";
    print FAIL "\nTotal match: $fc\n\n";
    
    
    }
[download]

expected result:

pass.rpt

---------------

Name scoreA scoreB

Alfert_pipe (sa) 82 57

fail.rpt

--------------

Olive_pipe 8 (sa) 58 20

mass/excel/i60 68 16

frey/let/sa/y589 78 30

noCheck.rpt

-------------

yuki/099/pipe -82 82

alan/excel/sa/y589 -70 -90

In reply to Perl: How to perfectly match specific data between two files and do comparison? by WWq

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Syntactic Confectionery Delight
	PerlMonks