Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

match pattern from two different file

by Anonymous Monk
on Jun 20, 2014 at 03:41 UTC ( #1090546=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hello all
i am currently working on a problem and trying to sort it with perl but code written is not working the way it should.
I want to extract common pattern from two file

file1 contains

LOC_Os01g01010.1 3017 : PAS [2993,3017] : AGTAAA AAATGGAGGAATTCTGCCA LOC_Os01g01010.2 2218 : uORF [7,129] : ATG AAGAAGACGAGACGACGACTTCCCCAC +TAGGAAACACGACGGAGGCGGAGATGATC
and file2 contains
LOC_Os01g01010.1 : PS51373 HIPIP High potential iron-sulfur proteins f +amily profile. 2139 - 2208 AATGATTTATCATGTGAGGTGAAAGA-----AGAGCACCGGGTGAACAGTTACAC +AAGAA L=-1 GAAAGTCCAAAAGCA LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. 298 - 309 CtCccTtcGTtC + L=(-1)
I am trying to retrieve "LOC_Os01g01010*"(as whole file contains variation of naming but LOC_Os01g remains constant)

can anyone plz help me.
#!/usr/bin/perl use strict; use warnings; open(FH, "outputps_scan_chr1_.out") or die "Can't Open File1\n"; my @array1=<FH>; close(FH); open(BH, "chr1_1.txt") or die "Can't Open File2\n"; my @array2=<BH>; close(BH); my $pattern='/^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; foreach my $line1(@array1) { foreach my $line2(@array2) if { $line1=~ /^$pattern/; $line2=~ /^$pattern/; $line1==$line2; { print $line1; } } else print "nothing to print\n"; }

This code is showing compilation error
Also in this code i am comparing a line, should i do like this as line also contain other things other than our pattern help thanks & regards

Comment on match pattern from two different file
Select or Download Code
Re: match pattern from two different file
by wjw (Deacon) on Jun 20, 2014 at 04:02 UTC

    any chance you could put the file contents in code tags? That might make this post readable... :-)

    Update Reflecting on the code just a bit, have you looked at any of the perl docs?

    if (whatever) { do something; } else { do the_other; }

    ...the majority is always wrong, and always the last to know about it...

    Insanity: Doing the same thing over and over again and expecting different results...

    A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

      file1 contains

      LOC_Os01g01010.1 3017 : PAS 2993,3017 : AGTAAA AAATGGAGGAATTCTGCCA LOC_Os01g01010.2 2218 : uORF 7,129 : ATG AAGAAGACGAGACGACGACTTCCCCACTA +GGAAACACGACGGAGGCGGAGATGATC

      and file2 contains

      LOC_Os01g01010.1 : PS51373 HIPIP High potential iron-sulfur proteins f +amily profile. 2139 - 2208 AATGATTTATCATGTGAGGTGAAAGA-----AGAGCACCGGG +TGAACAGTTACACAAGAA L=-1 GAAAGTCCAAAAGCA LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. 298 - 30 +9 CtCccTtcGTtC L=(-1)

      I am trying to retrieve "LOC_Os01g01010*"(as whole file contains variation of naming but LOC_Os01g remains constant)

      yes, i did write once by giving condition with and that also didn't produce any result
      actually i have tried writing in so many ways, that it become all jumbled and now i don't know what should do

Re: match pattern from two different file
by vinoth.ree (Parson) on Jun 20, 2014 at 04:50 UTC

      yes, i did but it is not giving any output. output file generating is of 0KB.

Re: match pattern from two different file
by Anonymous Monk on Jun 20, 2014 at 06:42 UTC
    This is syntax error
    if # you forgot some parens here, like someone said { $line1=~ /^$pattern/; $line2=~ /^$pattern/; $line1==$line2; { print $line1; } }
    Also
    my $pattern='/^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; ... $line1=~ /^$pattern/; # $line1 (and 2) will expand to smth like /^^LOC_0s.../, which is prob +ably not what you want. Try my $pat = qr/^LOC_0s.../ $line1 =~ $pat;
      I mean patterns, not lines, will expand to that. They will start with double ^. That is a regex error.

        I remove the if condition and change it pattern according to u .it runs but result is not as it should be. the whole file comes as output

Re: match pattern from two different file
by vinoth.ree (Parson) on Jun 20, 2014 at 07:18 UTC

    foreach my $line1(@array1) { foreach my $line2(@array2) if { $line1=~ /^$pattern/; $line2=~ /^$pattern/; $line1==$line2; { print $line1; } }

    Your code has syntax error.

    moreover eq is for comparing strings. == is for comparing numbers.

    Adding code:
    #!/usr/bin/perl use strict; use warnings; open(FH, "<","./file1.txt") or die "Can't Open File1\n"; open(BH, "<","./file2.txt") or die "Can't Open File2\n"; my $pattern='^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; while(my $line1 = <FH>) { chomp($line1); #print "Line1:$line1\n"; if( $line1=~ /$pattern/) { while(my $line2 = <BH>) { chomp($line2); if($line2 =~ /$pattern/) { if($line1 eq $line2) { print "Matches: ". $line1; } else { print "nothing to print\n"; } } } seek(BH,0,0); } } close(FH); close(BH);

    file1.txt

    LOC_Os01g01010.1 3017 : PAS 2993,3017 : AGTAAA AAATGGAGGAATTCTGCCA LOC_Os01g01010.2 2218 : uORF 7,129 : ATG AAGAAGACGAGACGACGACTTCCCCACTA +GGAAACACGACGGAGGCGGAGATGATC

    file2.txt

    LOC_Os01g01010.1 : PS51373 HIPIP High potential iron-sulfur proteins f +amily profile. 2139 - 2208 AATGATTTATCATGTGAGGTGAAAGA-----AGAGCACCGGG +TGAACAGTTACACAAGAA L=-1 GAAAGTCCAAAAGCA LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. 298 - 30 +9 CtCccTtcGTtC L=(-1 LOC_Os01g01010.2 2218 : uORF 7,129 : ATG AAGAAGACGAGACGACGACTTCCCCACTA +GGAAACACGACGGAGGCGGAGATGATC

    All is well

      thank u code but it retrieve "nothing to print" multiple times
      I also tried again by using split().but it also print same as ur code
      newly written code

      #!/usr/bin/perl use strict; use warnings; open(FH, "outputps_scan_chr1_.out") or die "Can't Open File1\n"; my @array1=<FH>; close(FH); open(BH, "chr1_1.txt") or die "Can't Open File2\n"; my @array2=<BH>; close(BH); my $pat='qr/^LOC_Os0.../'; foreach my $line1(@array1) { my @array3= split(/ /, $line1); foreach my $line2(@array2) { my @array4= split(/ /, $line2); foreach my $line3(@array3) { foreach my $line4(@array4) { $line3=~ $pat; $line4=~ $pat; if ($line3 eq $line4) { print $line3; } else { print "nothing to right\n"; } } } } }

      Hey do you want to match only the pattern LOC_Os0[1-7]g[0-9]+.[0-9] in each line from both the files? or entire line? As your first post matches the entire line, that why I was checking entire line match. Please correct me if i am wrong.


      All is well

        yes I only want LOC_Os01-7g0-9+.0-9 from both files

        yes I only want LOC_Os0[1-7]g[0-9]+.[0-9] from both files

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1090546]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2014-11-01 04:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (227 votes), past polls