Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^4: match pattern from two different file

by vinoth.ree (Monsignor)
on Jun 20, 2014 at 09:29 UTC ( [id://1090595]=note: print w/replies, xml ) Need Help??


in reply to Re^3: match pattern from two different file
in thread match pattern from two different file

Ok, then just remove the following if condition from my code and print the $line1

#!/usr/bin/perl use strict; use warnings; open(FH, "<","./file1.txt") or die "Can't Open File1\n"; open(BH, "<","./file2.txt") or die "Can't Open File2\n"; my $pattern='^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; while(my $line1 = <FH>) { chomp($line1); #print "Line1:$line1\n"; if( $line1=~ /$pattern/) { while(my $line2 = <BH>) { chomp($line2); if($line2 =~ /$pattern/) { print "Matches: ". $line1; } else { print "nothing to print\n"; } } seek(BH,0,0); } } close(FH); close(BH);


All is well

Replies are listed 'Best First'.
Re^5: match pattern from two different file
by Anonymous Monk on Jun 20, 2014 at 14:14 UTC
    Thank u very very much, u r life saver (maybe I am exaggerating), but I am really thankful for your help.
    God Bless You
Re^5: match pattern from two different file
by Anonymous Monk on Jun 23, 2014 at 05:05 UTC

    hi vinoth.ree
    hello again
    can u help once again
    I was using code u hv written and it was working fine. but earlier my file has redundancy so output, as u can imagine from pattern matching, was huge.
    so before pattern match i thought to remove redundancy from both the input files.
    but non redundant input file for pattern matching is not giving output as it should (resulting output file have multiple entries making file redundant ad bulky again). my files

    file 1 LOC_Os01g01010.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01019.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01030.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.4 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.3 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01040.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01050.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01050.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01060.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01070.3 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01070.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01070.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01080.2 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01080.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01080.3 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01090.1 : PS00022 EGF_1 EGF-like domain signature 1. LOC_Os01g01100.1 : PS00022 EGF_1 EGF-like domain signature 1. file 2 LOC_Os01g01010.1 3017 : uORF [3,233] : ATG AGCTGGTGGGGATGCTCTAAGAGAACG +AGAGAAGCACAGAGCAGATAAACCACACCCACAGGCACCACCGTCCTTGTTGGTAATGAAGAAGACGAG +ACGACGACTTCCCCACTAGGAAACACGACGGAGGCGGAGATGATCGACGGCGGAGAGAGCTACAGAAAC +ATCGATGCCTCCTGTCCAATCCCCCCATCCCATTCGGTAGTTGGATTGAAGACTACCGAA TAA LOC_Os01g01010.2 2218 : uORF [7,129] : ATG AAGAAGACGAGACGACGACTTCCCCAC +TAGGAAACACGACGGAGGCGGAGATGATCGACGGCGGAGAGAGCTACAGAAACATCGATGCCTCCTGTC +CAATCCCCCCATCCCATTCGG TAG LOC_Os01g01019.1 1127 : CPE [1010,1127] : TTTTTAAT TTTTCGATAGCCAAATATT +AACTATTTAGCGACTTTATTGTCTGGTGTCCGAAGAAGAATATATGTAAATGACATTACCAT AATAAA + TGTTGAATGCTTCATCAAATTTT LOC_Os01g01030.1 2464 : IRES [2366,2464] : TAACT GAATTA GTATTC TA AGAA +T ATGTC AGTTT ACAAT CTTA ATTCT TAA GAAAGT CTAAA AGTCG TGC ATGTGC GTTC +CGA GCACAC ACTTTTTCGT LOC_Os01g01040.4 1524 : IRES [1436,1524] : AACTA CATT GTGGAG AT TAGCAA + CGAAAAT GTGCTA GGCCC AGGT GAGCT T TTCTAG TGATT GT TGATA CCTACATA AG +TCA TCTTTCC LOC_Os01g01040.1 2508 : IRES [2418,2508] : TGTTG TTGTT GACTA T GTGGT A +CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA +TTGTG TCAATTA LOC_Os01g01040.3 2583 : IRES [2493,2583] : TGTTG TTGTT GACTA T GTGGT A +CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA +TTGTG TCAATTA LOC_Os01g01040.2 2482 : IRES [2392,2482] : TGTTG TTGTT GACTA T GTGGT A +CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA +TTGTG TCAATTA LOC_Os01g01050.2 1996 : IRES [1911,1996] : GTTGG TCTCA TTTTCG TT TGCTG + CTGGTTAC TTGTA TTAAT ACATT ATAGA AAA TGAGTA CA TAAAT AT ACATG ACGA T +ATGA TCC LOC_Os01g01050.1 2039 : IRES [1954,2039] : GTTGG TCTCA TTTTCG TT TGCTG + CTGGTTAC TTGTA TTAAT ACATT ATAGA AAA TGAGTA CA TAAAT AT ACATG ACGA T +ATGA TCC LOC_Os01g01060.1 920 : K-BOX [778,785] : CTGTGATT LOC_Os01g01070.3 1369 : uORF [19,87] : ATG CGAACGAGCACCGGATCCGCTGCGGCT +GCTCGGCGTCGGGTCGGAGGTGAGGTCTCGAAACCC TAG LOC_Os01g01070.1 1568 : IRES [1465,1568] : AGCAAG TTTGTT TGGGG AG GATG +TACT GGAATAAG GGTATAGT AGTAGTA GGAAT TATTATG GCAC ATTTG CATGCT TT GG +CATA TGGCACTC TGAGTT TTATT LOC_Os01g01070.2 1562 : IRES [1459,1562] : AGCAAG TTTGTT TGGGG AG GATG +TACT GGAATAAG GGTATAGT AGTAGTA GGAAT TATTATG GCAC ATTTG CATGCT TT GG +CATA I only want to match the pattern "(LOC_Os0[1-7]g[0-9]*.[0-9])\s"
      Use a hash if you want to ignore the duplicates, like this
      #!/usr/local/bin/perl use strict; use warnings; my $pattern = qr'^LOC_Os0[1-7]g[0-9]*.[0-9]\s'; # read files my $hash1 = parse_file('file1.txt'); my $hash2 = parse_file('file2.txt'); # report my $count=0; for my $loc (sort keys %$hash1){ if ( exists $hash2->{$loc} ){ print "$loc\n"; ++$count; } } print "$count unique patterns in both files\n"; # parse file sub parse_file{ my $filename = shift; my %hash=(); my $count=0; open FH,'<',$filename or die "Can't Open $filename\n"; while ( my $line = <FH> ){ if ( $line =~ /($pattern)/ ){ $hash{$1} = 1; } ++$count; } close FH; print "$count lines read from $filename\n"; return \%hash; }
      poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1090595]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-19 20:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found