http://www.perlmonks.org?node_id=142908


in reply to about regular expression

You can grab all the possible values for intron and exon with your regex and then split them up.

Consider replacing your intron/exon elsif blocks with this:
#new intron elsif block elsif(/\s+\/intron="(.+)"\n/) { foreach $item (split('\;',$1)) { print OUT "Intron\t $item\n"; } }
I replaced all the *s with +s, from my understanding this is more efficient, but I'm no regex guru :) The regex puts everything between the "double quotes" in $1

This will print out, based on your input data:
Intron 1-48 Intron 334-385
Now that they are separated, you can do whatever you want with them.

Ryan

Replies are listed 'Best First'.
Re: Re: about regular expression
by particle (Vicar) on Feb 02, 2002 at 16:36 UTC
    be careful, ryan.

    .+ matches one or more characters.
    .* matches zero or more characters.

    augustina_s specified in her dataset that there might be an empty list in the dataset. the second .+ would break in that case.

    also, you don't need to escape semi-colon (;).

    ~Particle

      Yep, point taken, if as your later post does, a blank set of inputs is mean to output for example 'Intron' with nothing after it then mine fails.

      Mine just prints nothing if there is no data for the input line. I didn't know which way is correct, because I lost some of the example code due to to some lovely DB errors this site keeps throwing me.

      also, you don't need to escape semi-colon (;).

      Ahh the wonders of being an incompetent novice, I'd say it doesn't hurt, but no doubt you'll give me an example of when it can :)