Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re^3: Looking for a flexible regex...

by BrowserUk (Pope)
on Sep 22, 2013 at 07:10 UTC ( #1055171=note: print w/replies, xml ) Need Help??

in reply to Re^2: Looking for a flexible regex...
in thread Looking for a flexible regex...

See, I said there'd be more to it :)

How would I modify your regex? ... I do have areas of the script that will error if the ranges are in the wrong order

Given that you now are seeking not to just validate a string, but will need to break that string up into its components, I wouldn't use a regex. I'd probably do this:

$s = "1-6,27,105-170,512,670-675";; @ranges = map{ my( $lo, $hi ) = split '-', $_; $hi //= $lo; die 'Bad input' if $lo > $hi; [ $lo, $hi ] } split ',', $s;; print "@$_" for @ranges;; 1 6 27 27 105 170 512 512 670 675

Note: That by converting single positions to a range of 1, it avoids the need for special casing later in the code.

Now, whether that alone is sufficient validation will depend upon where the string is coming from and what you are doing with the ranges later in the script.

For example, in genomic work, these types of range lists are often (usually) the output from some previous process (Blast or similar), and are thus pretty much guaranteed to correct; ie. properly ordered, sorted, non-overlapping etc.

But, if this was manual input from a user, you might need to be more stringent. Then you have to decide what to do if the user enters:

  1. malformed ranges;
  2. overlapping ranges;
  3. correctly formed but disordered ranges;

Some of those you could correct automatically -- eg. sort the list -- others you'd have to report the errors and either die or prompt for corrections.

Of course, some people would apply the stringent tests even for input coming from a program that is "pretty much guaranteed not to make those mistakes"; and that's a value judgement you'll have to make yourself.

Also, I'm curious what is going on in your regex -- I've heard about clustering, but never really made use of it before.

It groups sub-elements of the regex so that one can apply quantifiers that affect that sub-group collectively, rather than indiviual elements.

how does the regex know how to process a string of indeterminate length?

This is an expanded explanation of my orginal regex:

qr[ ^\s* ## from the start of the string, skip w +hitespace (if any) (?: \d+ - \d+ ) ## then grab (at least one) pair of num +bers separated by '-' (?: ## a group \s* , \s* ### a comma, optionally preceded or + followed by whitespace \d+ - \d+ ### and another pair of numbers sep +arated by '-' )* ## zero or more times \s*$ ## to the end of string, optionally ski +pping whitespace if any ]x

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Looking for a flexible regex...
by Anonymous Monk on Sep 22, 2013 at 16:09 UTC
    Thanks for the great explanation. I truly appreciate it. It's been several years since I worked in perl, and regex was never my strong point. ;) I've always loved how supporting/patient the community is here.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1055171]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2018-04-23 03:12 GMT
Find Nodes?
    Voting Booth?