Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Substring consisting of all the characters until "character X"?

by ELISHEVA (Prior)
on Mar 07, 2011 at 09:00 UTC ( #891779=note: print w/ replies, xml ) Need Help??


in reply to Substring consisting of all the characters until "character X"?

My guess is that the reason there isn't such a subroutine ready-made in Perl is because that kind of fancy substring extraction is usually handled by a regular expression in Perl. Regular expressions are more capable of handling the huge variety in methods for ending a string: a single character or end of string, a set of terminal characters (space or X or Z, whichever comes first), first occurance of a single character or a maximum number of total characters, a terminal string rather than a single terminal character, and many, many more. Some examples:

#from chr 2 to right before first space or to the end of $str #if no space is found # - ^.{2} = skip past first two characters # - \S = not whitespace, \s=whitespace # - (\S*) captures zero or more non-whitespace characters # - ($str =~ /^.{2}(\S*)\s/) is a list containing one string, # i.e. ($1) where $1=what was captured by (\S*) printf "substr(2, first ' ' or end): %s\n", ($str =~ /^.{2}(\S*)/); #from chr 2 to lessor of 5 character or first space #\S = not whitespace, \s=whitespace printf "substr(2, first ' ' or 5 chars): %s\n" , ($str =~ /^.{2}(\S{0,5})/); #from chr 3 to first X or end of $str printf "substr(3, first 'X' or end): %s\n" , ($str =~ /^.{3}([^X]*)/); #from chr 3 to lessor of first X or 5 chars printf "substr(3, first 'X' or 5 chars): %s\n" , ($str =~ /^.{3}([^X]{0,5})/); #from chr 3 to first occurance of two or more A's or to the end if #no doubled A's are found printf "substr(3,two or more A's or end): %s\n" , ($str =~ /^.{3}(.*?)(AA|$)/); #from chr 10 to lessor of 5 chars or first of run of 2 or more A's printf "substr(10,two or more A's or 5 chars): %s\n" , ($str =~ /^.{10}((?:[^A]|A(?!A)){0,5})/); #from chr 10 to lessor of 5 chars or first of run of 2 or more X's printf "substr(10,two or more X's or 5 chars): %s\n" , ($str =~ /^.{10}((?:[^X]|X(?!X)){0,5})/); #from chr 5 to first occurance of two or more X's or to the end if #no doubled A's are found printf "substr(3,two or more X's or end): %s\n" , ($str =~ /^.{3}(.*?)(?:XX|$)/); #outputs substr(2, first ' ' or end): XCDEFDGHIXTAAGRAAAAAA substr(2, first ' ' or 5 chars): XCDEF substr(3, first 'X' or end): CDEFDGHI substr(3, first 'X' or 5 chars): CDEFD substr(3,two or more A's or end): CDEFDGHIXT substr(10,two or more A's or 5 chars): IXT substr(10,two or more X's or 5 chars): IXTAA substr(3,two or more X's or end): CDEFDGHIXTAAGRAAAAAA theEnd

I grant you the syntax of those regular expressions above is somewhat arcane and cryptic. They aren't as obvious to the untrained eye as substr_chr($str,3,'A'). However, they give you much more flexibility to roll your own string endings with just a few keystrokes.

Have you had a chance to study perlretut and perlre? If not, consider doing so. If you are extract strings based on characters or other textual considerations on a regular basis, you will find regexes a very powerful tool in your toolkit.

Update: fixed typos in output labels


Comment on Re: Substring consisting of all the characters until "character X"?
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://891779]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (14)
As of 2015-07-06 16:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (77 votes), past polls