Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

match pattern from two different file

by Anonymous Monk
on Jun 20, 2014 at 03:41 UTC ( #1090546=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hello all
i am currently working on a problem and trying to sort it with perl but code written is not working the way it should.
I want to extract common pattern from two file

file1 contains

LOC_Os01g01010.1 3017 : PAS [2993,3017] : AGTAAA AAATGGAGGAATTCTGCCA LOC_Os01g01010.2 2218 : uORF [7,129] : ATG AAGAAGACGAGACGACGACTTCCCCAC +TAGGAAACACGACGGAGGCGGAGATGATC
and file2 contains
LOC_Os01g01010.1 : PS51373 HIPIP High potential iron-sulfur proteins f +amily profile. 2139 - 2208 AATGATTTATCATGTGAGGTGAAAGA-----AGAGCACCGGGTGAACAGTTACAC +AAGAA L=-1 GAAAGTCCAAAAGCA LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. 298 - 309 CtCccTtcGTtC + L=(-1)
I am trying to retrieve "LOC_Os01g01010*"(as whole file contains variation of naming but LOC_Os01g remains constant)

can anyone plz help me.
#!/usr/bin/perl use strict; use warnings; open(FH, "outputps_scan_chr1_.out") or die "Can't Open File1\n"; my @array1=<FH>; close(FH); open(BH, "chr1_1.txt") or die "Can't Open File2\n"; my @array2=<BH>; close(BH); my $pattern='/^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; foreach my $line1(@array1) { foreach my $line2(@array2) if { $line1=~ /^$pattern/; $line2=~ /^$pattern/; $line1==$line2; { print $line1; } } else print "nothing to print\n"; }

This code is showing compilation error
Also in this code i am comparing a line, should i do like this as line also contain other things other than our pattern help thanks & regards

Comment on match pattern from two different file
Select or Download Code
Re: match pattern from two different file
by wjw (Curate) on Jun 20, 2014 at 04:02 UTC

    any chance you could put the file contents in code tags? That might make this post readable... :-)

    Update Reflecting on the code just a bit, have you looked at any of the perl docs?

    if (whatever) { do something; } else { do the_other; }

    ...the majority is always wrong, and always the last to know about it...

    Insanity: Doing the same thing over and over again and expecting different results...

    A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

      file1 contains

      LOC_Os01g01010.1 3017 : PAS 2993,3017 : AGTAAA AAATGGAGGAATTCTGCCA LOC_Os01g01010.2 2218 : uORF 7,129 : ATG AAGAAGACGAGACGACGACTTCCCCACTA +GGAAACACGACGGAGGCGGAGATGATC

      and file2 contains

      LOC_Os01g01010.1 : PS51373 HIPIP High potential iron-sulfur proteins f +amily profile. 2139 - 2208 AATGATTTATCATGTGAGGTGAAAGA-----AGAGCACCGGG +TGAACAGTTACACAAGAA L=-1 GAAAGTCCAAAAGCA LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. 298 - 30 +9 CtCccTtcGTtC L=(-1)

      I am trying to retrieve "LOC_Os01g01010*"(as whole file contains variation of naming but LOC_Os01g remains constant)

      yes, i did write once by giving condition with and that also didn't produce any result
      actually i have tried writing in so many ways, that it become all jumbled and now i don't know what should do

Re: match pattern from two different file
by vinoth.ree (Parson) on Jun 20, 2014 at 04:50 UTC

      yes, i did but it is not giving any output. output file generating is of 0KB.

Re: match pattern from two different file
by Anonymous Monk on Jun 20, 2014 at 06:42 UTC
    This is syntax error
    if # you forgot some parens here, like someone said { $line1=~ /^$pattern/; $line2=~ /^$pattern/; $line1==$line2; { print $line1; } }
    Also
    my $pattern='/^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; ... $line1=~ /^$pattern/; # $line1 (and 2) will expand to smth like /^^LOC_0s.../, which is prob +ably not what you want. Try my $pat = qr/^LOC_0s.../ $line1 =~ $pat;
      I mean patterns, not lines, will expand to that. They will start with double ^. That is a regex error.

        I remove the if condition and change it pattern according to u .it runs but result is not as it should be. the whole file comes as output

Re: match pattern from two different file
by vinoth.ree (Parson) on Jun 20, 2014 at 07:18 UTC

    foreach my $line1(@array1) { foreach my $line2(@array2) if { $line1=~ /^$pattern/; $line2=~ /^$pattern/; $line1==$line2; { print $line1; } }

    Your code has syntax error.

    moreover eq is for comparing strings. == is for comparing numbers.

    Adding code:
    #!/usr/bin/perl use strict; use warnings; open(FH, "<","./file1.txt") or die "Can't Open File1\n"; open(BH, "<","./file2.txt") or die "Can't Open File2\n"; my $pattern='^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; while(my $line1 = <FH>) { chomp($line1); #print "Line1:$line1\n"; if( $line1=~ /$pattern/) { while(my $line2 = <BH>) { chomp($line2); if($line2 =~ /$pattern/) { if($line1 eq $line2) { print "Matches: ". $line1; } else { print "nothing to print\n"; } } } seek(BH,0,0); } } close(FH); close(BH);

    file1.txt

    LOC_Os01g01010.1 3017 : PAS 2993,3017 : AGTAAA AAATGGAGGAATTCTGCCA LOC_Os01g01010.2 2218 : uORF 7,129 : ATG AAGAAGACGAGACGACGACTTCCCCACTA +GGAAACACGACGGAGGCGGAGATGATC

    file2.txt

    LOC_Os01g01010.1 : PS51373 HIPIP High potential iron-sulfur proteins f +amily profile. 2139 - 2208 AATGATTTATCATGTGAGGTGAAAGA-----AGAGCACCGGG +TGAACAGTTACACAAGAA L=-1 GAAAGTCCAAAAGCA LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. 298 - 30 +9 CtCccTtcGTtC L=(-1 LOC_Os01g01010.2 2218 : uORF 7,129 : ATG AAGAAGACGAGACGACGACTTCCCCACTA +GGAAACACGACGGAGGCGGAGATGATC

    All is well

      thank u code but it retrieve "nothing to print" multiple times
      I also tried again by using split().but it also print same as ur code
      newly written code

      #!/usr/bin/perl use strict; use warnings; open(FH, "outputps_scan_chr1_.out") or die "Can't Open File1\n"; my @array1=<FH>; close(FH); open(BH, "chr1_1.txt") or die "Can't Open File2\n"; my @array2=<BH>; close(BH); my $pat='qr/^LOC_Os0.../'; foreach my $line1(@array1) { my @array3= split(/ /, $line1); foreach my $line2(@array2) { my @array4= split(/ /, $line2); foreach my $line3(@array3) { foreach my $line4(@array4) { $line3=~ $pat; $line4=~ $pat; if ($line3 eq $line4) { print $line3; } else { print "nothing to right\n"; } } } } }

      Hey do you want to match only the pattern LOC_Os0[1-7]g[0-9]+.[0-9] in each line from both the files? or entire line? As your first post matches the entire line, that why I was checking entire line match. Please correct me if i am wrong.


      All is well

        yes I only want LOC_Os01-7g0-9+.0-9 from both files

        yes I only want LOC_Os0[1-7]g[0-9]+.[0-9] from both files

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1090546]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2014-12-22 10:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (114 votes), past polls