Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Capturing substrings with complex delimiter, up to a maximum

by jkeenan1 (Deacon)
on Oct 30, 2013 at 01:27 UTC ( [id://1060257]=note: print w/replies, xml ) Need Help??


in reply to Re: Capturing substrings with complex delimiter, up to a maximum
in thread Capturing substrings with complex delimiter, up to a maximum

Thanks for your rapid response. I tried to adapt your suggestion to a subroutine which I could easily drop into my test file. I found that at first it did not pass all tests.

sub _browseruk_recognize_limited_urls { my ($input, $max) = @_; my @captures = $input =~ m[(https?://.+?)(?:,(?=http)|$)]g; return [ @captures[ 0 .. $max-1 ] ]; }
Results:
capture.t .. not ok 1 - 1 URL # Failed test '1 URL' # at capture.t line 18. not ok 2 - 2 URLs (one containing a comma) ok 3 - 3 URLs (one containing a comma) ok 4 - Still only 3 URLs (one containing a comma); reject those over m +ax 1..4 # Structures begin differing at: # $got->[1] = undef # $expected->[1] = Does not exist # Failed test '2 URLs (one containing a comma)' # at capture.t line 23. # Structures begin differing at: # $got->[2] = undef # $expected->[2] = Does not exist # Looks like you failed 2 tests of 4. Dubious, test returned 2 (wstat 512, 0x200) Failed 2/4 subtests Test Summary Report ------------------- capture.t (Wstat: 512 Tests: 4 Failed: 2) Failed tests: 1-2 Non-zero exit status: 2 Files=1, Tests=4, 1 wallclock secs ( 0.13 usr 0.04 sys + 0.11 cusr + 0.05 csys = 0.33 CPU) Result: FAIL shell returned 1

However, when I grepped for definedness ...

sub _browseruk_recognize_limited_urls { my ($input, $max) = @_; my @captures = $input =~ m[(https?://.+?)(?:,(?=http)|$)]g; return [ grep { defined($_) } @captures[ 0 .. $max-1 ] ]; }
... all tests passed.

Thank you very much.

Jim Keenan

Replies are listed 'Best First'.
Re^3: Capturing substrings with complex delimiter, up to a maximum
by BrowserUk (Patriarch) on Oct 30, 2013 at 01:53 UTC

    The regex will (can only) return valid matches. It cannot return undef. You should not have to use grep.

    Thus, it is either your adaption of the code, or your test that is wrong.

    The only way you can generate undef's with your first implementation, is if $max is greater than the number of urls within the string, in which case the slice @captures[ 0 .. $max-1 ] will generate undefs.

    Instead of greping out the extraneous undef's; don't generate them in the first place:

    sub _browseruk_recognize_limited_urls { my ($input, $max) = @_; my @captures = $input =~ m[(https?://.+?)(?:,(?=http)|$)]g; return [ @captures[ 0 .. @captures < $max ? $#captures : $max -1 ] +]; }

    But even that is rather silly. You have an array but you want to return an array ref (for some reason?), so you slice the array into a list, wrap it in another (anonymous) array and return a reference to that.

    If the reason for returning a reference is "efficiency"; you completely blew any potential gain by splicing and listing (never mind the redundant greping). Better to simply return that list and assign it to an array in the caller.

    But, if you really need a reference, then adjust the size of the array you already have and then return a reference to that:

    sub _browseruk_recognize_limited_urls { my ($input, $max) = @_; my @captures = $input =~ m[(https?://.+?)(?:,(?=http)|$)]g; $#captures = $max -1 if @captures >= $max; ## Adjust size if necess +ary return \@captures. }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1060257]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-23 15:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found