hi vinoth.ree
hello again
can u help once again
I was using code u hv written and it was working fine. but earlier my file has redundancy so output, as u can imagine from pattern matching, was huge.
so before pattern match i thought to remove redundancy from both the input files.
but non redundant input file for pattern matching is not giving output as it should (resulting output file have multiple entries making file redundant ad bulky again).
my files
file 1
LOC_Os01g01010.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01010.2 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01019.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01030.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01040.4 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01040.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01040.3 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01040.2 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01050.2 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01050.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01060.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01070.3 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01070.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01070.2 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01080.2 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01080.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01080.3 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01090.1 : PS00022 EGF_1 EGF-like domain signature 1.
LOC_Os01g01100.1 : PS00022 EGF_1 EGF-like domain signature 1.
file 2
LOC_Os01g01010.1 3017 : uORF [3,233] : ATG AGCTGGTGGGGATGCTCTAAGAGAACG
+AGAGAAGCACAGAGCAGATAAACCACACCCACAGGCACCACCGTCCTTGTTGGTAATGAAGAAGACGAG
+ACGACGACTTCCCCACTAGGAAACACGACGGAGGCGGAGATGATCGACGGCGGAGAGAGCTACAGAAAC
+ATCGATGCCTCCTGTCCAATCCCCCCATCCCATTCGGTAGTTGGATTGAAGACTACCGAA TAA
LOC_Os01g01010.2 2218 : uORF [7,129] : ATG AAGAAGACGAGACGACGACTTCCCCAC
+TAGGAAACACGACGGAGGCGGAGATGATCGACGGCGGAGAGAGCTACAGAAACATCGATGCCTCCTGTC
+CAATCCCCCCATCCCATTCGG TAG
LOC_Os01g01019.1 1127 : CPE [1010,1127] : TTTTTAAT TTTTCGATAGCCAAATATT
+AACTATTTAGCGACTTTATTGTCTGGTGTCCGAAGAAGAATATATGTAAATGACATTACCAT AATAAA
+ TGTTGAATGCTTCATCAAATTTT
LOC_Os01g01030.1 2464 : IRES [2366,2464] : TAACT GAATTA GTATTC TA AGAA
+T ATGTC AGTTT ACAAT CTTA ATTCT TAA GAAAGT CTAAA AGTCG TGC ATGTGC GTTC
+CGA GCACAC ACTTTTTCGT
LOC_Os01g01040.4 1524 : IRES [1436,1524] : AACTA CATT GTGGAG AT TAGCAA
+ CGAAAAT GTGCTA GGCCC AGGT GAGCT T TTCTAG TGATT GT TGATA CCTACATA AG
+TCA TCTTTCC
LOC_Os01g01040.1 2508 : IRES [2418,2508] : TGTTG TTGTT GACTA T GTGGT A
+CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA
+TTGTG TCAATTA
LOC_Os01g01040.3 2583 : IRES [2493,2583] : TGTTG TTGTT GACTA T GTGGT A
+CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA
+TTGTG TCAATTA
LOC_Os01g01040.2 2482 : IRES [2392,2482] : TGTTG TTGTT GACTA T GTGGT A
+CTTTGT GATGC TTGGA CATG TTTAT ATG TGGTG CTATGT TAAAA AATCC TGTTG AAA
+TTGTG TCAATTA
LOC_Os01g01050.2 1996 : IRES [1911,1996] : GTTGG TCTCA TTTTCG TT TGCTG
+ CTGGTTAC TTGTA TTAAT ACATT ATAGA AAA TGAGTA CA TAAAT AT ACATG ACGA T
+ATGA TCC
LOC_Os01g01050.1 2039 : IRES [1954,2039] : GTTGG TCTCA TTTTCG TT TGCTG
+ CTGGTTAC TTGTA TTAAT ACATT ATAGA AAA TGAGTA CA TAAAT AT ACATG ACGA T
+ATGA TCC
LOC_Os01g01060.1 920 : K-BOX [778,785] : CTGTGATT
LOC_Os01g01070.3 1369 : uORF [19,87] : ATG CGAACGAGCACCGGATCCGCTGCGGCT
+GCTCGGCGTCGGGTCGGAGGTGAGGTCTCGAAACCC TAG
LOC_Os01g01070.1 1568 : IRES [1465,1568] : AGCAAG TTTGTT TGGGG AG GATG
+TACT GGAATAAG GGTATAGT AGTAGTA GGAAT TATTATG GCAC ATTTG CATGCT TT GG
+CATA TGGCACTC TGAGTT TTATT
LOC_Os01g01070.2 1562 : IRES [1459,1562] : AGCAAG TTTGTT TGGGG AG GATG
+TACT GGAATAAG GGTATAGT AGTAGTA GGAAT TATTATG GCAC ATTTG CATGCT TT GG
+CATA
I only want to match the pattern
"(LOC_Os0[1-7]g[0-9]*.[0-9])\s"
|