generate regular expression

khoueiry has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: generate regular expression by QM (Parson) on Mar 30, 2006 at 17:03 UTC
I have no idea what you mean by "2 CAC and 2 TTT". Is this in one string? Side by side? Intermixed? Unless you come up with a language that maps one-to-one to Perl regex, your users are going to be surprised from time to time by your implementation. Wouldn't it be better to have them use a subset of Perl regex directly? You could define what this subset is, and let Perl do all of the heavy lifting. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]
Re^2: generate regular expression by khoueiry (Initiate) on Mar 30, 2006 at 17:10 UTC
QM, sorry for the lack of precision on the motifs distribution . the patterns should be intermixed and all the sequence should be treated as one string. If you can guide me on cpan modules that may refer to some of my needs it will be sufficient I think. Thanks a lot Pierre	[reply]
Re^3: generate regular expression by QM (Parson) on Mar 30, 2006 at 18:56 UTC
I think you missed the point. You're asking for a CPAN module to convert "English to Perl Regex". The reason Perl isn't written completely in English is because English is often ambiguous. Just examine this thread -- if we have to ask for clarification, then it's not useful as a regex specification. You would be better served by dumping the idea of a module to solve your problem, and train your users. If they can't specify what they want clearly, having it in English or another natural language isn't going to help. For example, depending on what you mean by "2 CAC and 2 TTT", this might DWYW: `((() = m/(CAC)/g) == 2) and ((()=m/(TTT)/g) == 2)` [download] But even with the context-sensitive regex engine, it is awkward (and error-prone) to specify this only in a single regex. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re: generate regular expression by wazzuteke (Hermit) on Mar 30, 2006 at 19:14 UTC
Another solution might just to generate the regular expression based on the user input. Assuming simple command-line script (or something similar, I didn't see the source of 'users query'. Easily ported to a web-app or something along those lines). You could do something along the lines of: `#!/usr/bin/perl use strict; use warnings; my $regex_cache = {}; die "Usage $0 [CAC <NUM> TTT <NUM>]\n" if ( @ARGV % 2 ); my %input = @ARGV; # Where the input would be : CAC 2 TTT 2 my $data_set = <DATA>; for my $in ( keys %input ) { my $reg_key = "($in)\{$input{$in}\}"; my $reg = $regex_cache->{$in} \|\|= qr/($reg_key)/s; print "$1\n" if ( $data_set =~ $reg ); } __DATA__ ATCACCACTTCCTGGACACTACCCTAAACCTTTGAGGA AATAACCGCTTTGTTGTTGCGATCGCCTAATAAATATC AGCGTCTTCGTATGATAAACCAATGCGGAAGTACAAAA` [download] Now, much like the other comments in this thread, I'm not really sure what type of order you are looking for in the set. If 2 CAC means 'CACCAC' or 'CAC\w*CAC', etc. Given the fact that I may have missed this, the compilation of the `$reg_key` can be changed to be something else. Nevertheless, it will still be able to parse the file based of some sort of input, which is what I believe you were generally looking for. Sorry if I'm way off base here and my input doesn't help, although I certainly hope it does. Good luck! ---hA\|\|ta---- `print$_ for(map{chr($_)}split(/\s+/,join(/\B?:\w+[^\s+]/,<DATA>))); __DATA__ 67 111 100 101 32 80 101 114 108` [download]	[reply] [d/l] [select]
Re^2: generate regular expression by khoueiry (Initiate) on Apr 01, 2006 at 10:12 UTC
Thanks, I ment by 2 CAC a separated motifs (CAC\w*CAC). I will test that.	[reply]
Re: generate regular expression by doc_faustroll (Scribe) on Mar 30, 2006 at 16:31 UTC
This looks suspicously like bioperl territory. and you are looking at something of an app here. download bioperl. read the docs. think about the problem space.	[reply]
Re^2: generate regular expression by khoueiry (Initiate) on Mar 30, 2006 at 16:56 UTC
Thanks a lot, I already know bioperl and its packages. There is no package in bioperl to treat that. I'm posting in perlmonks to "seek perl wisdom" on that issue.	[reply]
Re: generate regular expression by injunjoel (Priest) on Mar 30, 2006 at 18:53 UTC
Greetings, In your example you have two different situations: The "and" situation "2 CAC and 2 TTT" in which case its really two seperate searches. And the "single" search situation "1 A\|T CAC", "2 C\|A TTT T\|G". If your search criteria is submitted as you have specified you could split the query on "and" or "or" to handle your first situation. Once split, capture out the count criteria and test with the scalar return value from your array of matches using the `(@array) = $string =~ /$pattern/g` [download] idiom. Untested Idea! my $base_data = 'ATCACTGGTTCCTGGACACTACCCTAAACCTTTGAGGA AATAACCGCTTTGTTGTTGCGATCGCCTAATAAATATC AGCGTCTTCGTATGATAAACCAATGCGGAAGTACAAAA TAAAGAGACTGTATTATGTTACT'; #the user submitted search pattern my $search_submitted = '2 CAC and 2 TTT'; #split it into chunks if applicable. my @search_chunks = split /and\|or/, $search_submitted; #for each distinct pattern foreach my $chunk (@search_chunks){ #get the count we are looking for and the pattern we want to use my ($count, $search_string) = $chunk =~ /\s?(\d+)\s?([ATGCU\s\\|]+) +/; #replace the \|'s with character classes. $search_string =~ s/([ATGCU])\\|([ATGCU])/[$1$2]/g; #replace all spaces $search_string =~ s/\s+//g; #run the match and see how many we get. my (@search_count) = $base_data =~ /$search_string/g; #check our results. if(scalar @search_count >= $count){ print "Found it!\n"; }else{ print "Nope...".scalar @search_count."\n"; } } [download] Is that sort of what you were thinking of? Read more... (72 Bytes) -InjunJoel "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo	[reply] [d/l] [select]
Re^2: generate regular expression by khoueiry (Initiate) on Apr 01, 2006 at 10:10 UTC
Thanks a lot. Actually it is too close of what i was thinking of. I found that I have to split the search to different step instead of making only one complicated query. Pierre	[reply]
Re: generate regular expression by swampyankee (Parson) on Mar 30, 2006 at 18:34 UTC
I think I understand what you're looking for: you want a function (or module) that will take natural language queries ("2 CAC and 2 TTT") and convert them into Perl code, which may or may not include regex. Now, my regex skills are below par, but I believe your second and third cases can be managed with single regexes. Your first case may require two. emc "Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. " —G. Steele	[reply]


Syntactic Confectionery Delight
	PerlMonks