Re: about regular expression

in reply to about regular expression

You can grab all the possible values for intron and exon with your regex and then split them up.

Consider replacing your intron/exon elsif blocks with this:

  #new intron elsif block
    elsif(/\s+\/intron="(.+)"\n/) {
        foreach $item (split('\;',$1)) {
            print OUT "Intron\t $item\n";
        }
    }
[download]

I replaced all the *s with +s, from my understanding this is more efficient, but I'm no regex guru :) The regex puts everything between the "double quotes" in $1

This will print out, based on your input data:

Intron   1-48
Intron   334-385
[download]

Now that they are separated, you can do whatever you want with them.

Ryan

Comment on Re: about regular expression Select or Download Code

Replies are listed 'Best First'.
Re: Re: about regular expression by particle (Vicar) on Feb 02, 2002 at 16:36 UTC
be careful, ryan. `.+` matches one or more characters. `.` matches zero* or more characters. augustina_s specified in her dataset that there might be an empty list in the dataset. the second `.+` would break in that case. also, you don't need to escape semi-colon (;). ~Particle	[reply] [d/l] [select]
Re: Re: Re: about regular expression by ryan (Pilgrim) on Feb 03, 2002 at 04:50 UTC
Yep, point taken, if as your later post does, a blank set of inputs is mean to output for example 'Intron' with nothing after it then mine fails. Mine just prints nothing if there is no data for the input line. I didn't know which way is correct, because I lost some of the example code due to to some lovely DB errors this site keeps throwing me. also, you don't need to escape semi-colon (;). Ahh the wonders of being an incompetent novice, I'd say it doesn't hurt, but no doubt you'll give me an example of when it can :)	[reply]

In Section Seekers of Perl Wisdom