newbio has asked for the wisdom of the Perl Monks concerning the following question:
Program Input:
Following are of interest: **carboxypeptidase** protein $$inhibitor$$ ( **CI** ) , **nanopeptidase** kinase $$inhibitor$$ , **NI** , and others such as , **p(57)** and **polypeptidase** protein $$inhibitor$$ ( **PI** ).
Program Output:
1. Following are of interest: **carboxypeptidase_protein_inhibitor_(CI)** , **nanopeptidase_kinase_inhibitor_(NI)** and others such as , **p(57)** and **polypeptidase_protein_inhibitor_(PI)**.
2. Following are of interest: **carboxypeptidase** protein $$inhibitor$$ ( **CI** ) , nanopeptidase kinase inhibitor , NI , and others such as , p(57) and polypeptidase protein inhibitor ( PI ).
3. Following are of interest: carboxypeptidase protein inhibitor ( CI ) , **nanopeptidase** kinase $$inhibitor$$ , **NI** , and others such as , p(57) and polypeptidase protein inhibitor ( PI ).
4. Following are of interest: carboxypeptidase protein inhibitor ( CI ) , nanopeptidase kinase inhibitor , NI , and others such as , p(57) and **polypeptidase** protein $$inhibitor$$ ( **PI** ).
While I can achieve output 1. using the regular expression substitution as shown below, I cannot figure out how output sentences 2,3 and 4 could be achieved.
if ($line =~ /\*\*([^\*]+)\*\*\s(kinase|isoform|protein|peptide|li +gand)\s\$\$([^\$]+)\$\$\s[\(\,]\s\*\*([^\*]+)\*\*\s[\)\,]/) { $line =~ s/\*\*([^\*]+)\*\*\s(kinase|isoform|protein|peptide|l +igand)\s\$\$([^\$]+)\$\$\s[\(\,]\s\*\*([^\*]+)\*\*\s[\)\,]/**$1_$2_$3 +_($4)**/g; print WF "$line\n"; }
While output sentence 1 represents the original sentence with all substitutions using the above code (there are 3 substitutions in this example although this number can vary with the sentence).
Each of the other remaining output sentences (e.g. 2,3 and 4) are the original input sentence, except that, the original pattern is retained in the sentence at the substitution location, while the tags in the sentence (i.e. ** and $$) are removed from all other places in the sentence. The number of such output sentences thus will be equal to the number of patterns substituted using the regex above (which is 3 in this example because there are 3 pattern substituted as shown in output 1.). Is there a nice way of doing this (getting outputs 2,3 and 4)?
Appreciate your help.
Thanks very much in advance.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: regex pattern match problem
by JadeNB (Chaplain) on Aug 05, 2009 at 20:57 UTC | |
Re: regex pattern match problem
by dwm042 (Priest) on Aug 05, 2009 at 20:09 UTC | |
Re: regex pattern match problem
by Polyglot (Chaplain) on Aug 05, 2009 at 19:57 UTC | |
by newbio (Beadle) on Aug 05, 2009 at 20:49 UTC | |
Re: regex pattern match problem
by newbio (Beadle) on Aug 06, 2009 at 15:16 UTC | |
by Polyglot (Chaplain) on Aug 10, 2009 at 05:34 UTC |