Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: split versus =~

by kcott (Archbishop)
on Nov 10, 2022 at 20:20 UTC ( [id://11148117]=note: print w/replies, xml ) Need Help??


in reply to split versus =~

G'day russlo,

Welcome to the Monastery.

"My question is: why?"

As others have pointed out, without any data, we can't really answer that. Here are a few possible reasons (non-exhaustive list):

  • You haven't anchored your regex. Your pattern could match anywhere in the string.
  • You're matching numbers that could be zero-length (\d*); a better choice would be \d+.
  • You say "X, Y, and Z are all numbers". Strictly speaking, you're matching (7-bit ASCII) digits; 1.23, 1e23, and so on are also numbers.

For future reference, please provide a "Short, Self-Contained, Correct Example" and follow the guidelines in "How do I post a question effectively?".

"Additionally: what can I do to provide the correct splitting that we're looking for here?"

Comment your regex in full. By forcing yourself to document exactly what your regex does, you will more easily spot logic errors and typos. By writing your regex as I've done in the code below, it's very easy to make changes (e.g. at some future point perhaps Z can be negative or the string becomes "W-X-Y-Z"); fiddling around inside a regex which is jammed into a single string with no whitespace is highly error-prone.

As others have already suggested, write a test script. In the code below, I added an "expect failure"; mostly to show you what that outputs. I also noted you mentioned a problem with '-2-3-4'; to be honest, I didn't follow what the problem was, but I added it for testing anyway. Add more tests if you encounter problem input that isn't handled by the regex; you may also need to alter the regex itself if it doesn't cover all eventualities.

Note that with the way I've written the code, you can just add to @tests without needing to change any other part of the code.

You should also provide some validation and error reporting. What happens if the input doesn't match the regex? — on-screen warning? logfile entry? kill the script?

Here's my test script:

#!/usr/bin/env perl use strict; use warnings; use constant { STR => 0, EXP => 1, }; use Test::More; my @tests = ( ['1-2-3', '123'], ['-1-2-3', '-123'], ['1--2-3', ''], ['1-2--3', ''], ['1--2--3', ''], ['-1--2-3', ''], ['-1-2--3', ''], ['-1--2--3', ''], ['1-2-', ''], ['-1-2-', ''], ['garbage', ''], ['expect', 'failure'], ['-2-3-4', '-234'], ); plan tests => 0+@tests; my $re = qr{(?x: ^ # start of string ( # start capture X -? # optional leading minus \d+ # 1 or more digits ) # end capture X - # required hyphen ( # start capture Y \d+ # 1 or more digits ) # end capture Y - # required hyphen ( # start capture Z \d+ # 1 or more digits ) # end capture Z $ # end of string )}; for my $test (@tests) { my ($X, $Y, $Z, $got) = ('') x 4; if (($X, $Y, $Z) = $test->[STR] =~ $re) { $got = "$X$Y$Z"; } ok($got eq $test->[EXP], "Testing '$test->[STR]' is " . (length $test->[EXP] ? 'GOOD' : 'BAD') ); }

And here's the output:

1..13 ok 1 - Testing '1-2-3' is GOOD ok 2 - Testing '-1-2-3' is GOOD ok 3 - Testing '1--2-3' is BAD ok 4 - Testing '1-2--3' is BAD ok 5 - Testing '1--2--3' is BAD ok 6 - Testing '-1--2-3' is BAD ok 7 - Testing '-1-2--3' is BAD ok 8 - Testing '-1--2--3' is BAD ok 9 - Testing '1-2-' is BAD ok 10 - Testing '-1-2-' is BAD ok 11 - Testing 'garbage' is BAD not ok 12 - Testing 'expect' is GOOD # Failed test 'Testing 'expect' is GOOD' # at ./pm_11148100_re_parse.pl line 55. ok 13 - Testing '-2-3-4' is GOOD # Looks like you failed 1 test of 13.

— Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148117]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-18 04:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found