Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Why split function treats single quotes literals as regex, instead of a special case?

by AnomalousMonk (Archbishop)
on Aug 14, 2020 at 04:38 UTC ( [id://11120709]=note: print w/replies, xml ) Need Help??


in reply to Re: Why split function treats single quotes literals as regex, instead of a special case?
in thread Why split function treats single quotes literals as regex, instead of a special case?

The single space character is a special case for split ...
I.e., per split:
As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a string composed of a single space character (such as ' ' or "\x20", but not e.g. / /). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.
You also write:
Regular expressions are also treated a bit differently than regular expressions in qr//, m// and s///.
I don't understand this statement. Can you elaborate?


Give a man a fish:  <%-{-{-{-<

  • Comment on Re^2: Why split function treats single quotes literals as regex, instead of a special case?
  • Select or Download Code

Replies are listed 'Best First'.
Re^3: Why split function treats single quotes literals as regex, instead of a special case?
by jwkrahn (Abbot) on Aug 14, 2020 at 09:16 UTC

    The regular expression // works differently in split then elsewhere:

    $ perl -le' my $x = "1234 abcd 5678"; print $& if $x =~ /[a-z]+/; print $& if $x =~ //; print map qq[ "$_"], split /[a-z]+/, $x; print map qq[ "$_"], split //, $x; ' abcd abcd "1234 " " 5678" "1" "2" "3" "4" " " "a" "b" "c" "d" " " "5" "6" "7" "8"

    Also, the line anchors /^/ and /$/ don't require the /m option to match lines in a string.

      The regular expression // works differently in split then elsewhere...

      I think I'd consider this just another special-case fixup prior to running split rather than a true difference in the function of m//:

      c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $x = qq{1234 abcd 5678}; dd split //, $x; dd split /\b|\B/, $x; " (1 .. 4, " ", "a" .. "d", " ", 5 .. 8) (1 .. 4, " ", "a" .. "d", " ", 5 .. 8)
      This is probably just a matter of emphasis and interpretation.

      ... line anchors /^/ and /$/ don't require the /m option to match lines in a string.

      Checking the docs, I recalled seeing this discussed before, but it's another one of those very specialized special cases that evaporates from my memory with time. However, it's not true for the /$/ case (per the docs (or at any rate, the docs say nothing about special-casing it)):

      c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $x = qq{1234 \n abcd \n 5678}; dd split /^/, $x; dd split /$/, $x; " ("1234 \n", " abcd \n", " 5678") "1234 \n abcd \n 5678"


      Give a man a fish:  <%-{-{-{-<

      The regular expression // works differently in split then elsewhere

      I think it is actually the other way around — in most contexts, m// is special (it refers to the most recent pattern without duplicating that pattern), while in split, // is literally the empty regex, which matches the zero-length empty string.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120709]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-25 17:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found