http://www.perlmonks.org?node_id=482881

GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

I wanted to split a string on | characters today and ran into a little bother. My first attempt looked like this:

split "|", $str

which did not do what I expected so I tried:

split "\|", $str;

which produced exactly the same result! My last attempt was:

split "\\|", $str;

which generated the result I was after. What is going on here? The perlfunc entry for split doesn't seem to help much.

use warnings; use strict; my $str = "lhs|rhs"; print join " ", split "|", $str; print "\n" . join " ", split "\|", $str; print "\n" . join " ", split "\\|", $str; prints: l h s | r h s l h s | r h s lhs rhs

Perl is Huffman encoded by design.

Replies are listed 'Best First'.
Re: Surprised by split
by Corion (Patriarch) on Aug 11, 2005 at 09:59 UTC

    split takes a regular expression as its first argument, and not a string. What tripped you off was the double-interpolation that occurs when using string literals as regular expressions - just don't do that, use regular expressions instead:

    split /\|/, $str;
Re: Surprised by split
by dave_the_m (Monsignor) on Aug 11, 2005 at 10:01 UTC
    You are using a string literal rather than a regular expression literal, so the \ gets removed by the double-quoted string handling code before it gets passed to the regex compiler. Use a regex literal instead:
    $ perl -wle 'print join ",", split /\|/, "abc|def"' abc,def $

    Dave.

Re: Surprised by split
by salva (Canon) on Aug 11, 2005 at 09:59 UTC
    the problem is that you are quoting the string with double quotes and so, scaped chars are unscaped before the string is passed to split. Use // or '' instead:
    split /\|/, 'foo|bar';
Re: Surprised by split
by gellyfish (Monsignor) on Aug 11, 2005 at 10:03 UTC

    You are getting stuffed up by the quoting here. You should bear in mind in the first instance that the first argument to split is a regular expression, and it is often prefered use the // rather than quotes. In your first attempt you are asking to split on anything or anything ('|' being the regex alternation character), in your second you are basically getting the same thing because the '\' is being eaten in the double quoted context (i.e. escaping the '|') and the third is correct because you are now escaping the '\' so you get a literal '\|'. You probably want to either use single quotes or the // for clarity.

    use warnings; use strict; + my $str = "lhs|rhs"; + print "\n" . join " ", split '\|', $str; print "\n" . join " ", split /\|/, $str;

    /J\

Re: Surprised by split
by davorg (Chancellor) on Aug 11, 2005 at 10:06 UTC

    The '\' has a special meaning in a double-quoted string. And _also_ a special meaning in a regex. You'll need two of them in order to break through both special meanings.

    But the first argument to "split" should be a regex. So don't pass it a string :)

    See... now I'm wondering about split qr(\|) and split m/\|/.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Hmm:
      $ perl -e '$a="m|b|a"; print join(",",split "\\|",$a),$/' m,b,a $ perl -e '$a="m|b|a"; print join(",",split /\|/,$a),$/' m,b,a $ perl -e '$a="m|b|a"; print join(",",split m/\|/,$a),$/' m,b,a $ perl -e '$a="m|b|a"; print join(",",split qr/\|/,$a),$/' m,b,a $ perl -e '$a="m|b|a"; $b=qr/\|/; print join(",",split $b,$a),$/' m,b,a $ perl -e '$a="m|b|a"; sub b{return qr/\|/}; print join(",",split b(), +$a),$/' m,b,a $ perl -e '$a="m|b|a"; print join(",",split "[|]",$a),$/' m,b,a

      Man, I love perl. :) (Updated a couple of times to add new variations)

      --
      $you = new YOU;
      honk() if $you->love(perl)

Re: Surprised by split
by muntfish (Chaplain) on Aug 11, 2005 at 10:01 UTC

    I think split behaves differently if the first parameter is a normal string instead of a regex. The perlfunc documentation doesn't mention this and I can't remember how exactly it differs. Anyway since | is a regex special character (alternation) you need to escape it. I'd normally write this as:

    split /\|/, $str

    The reason you need two \s is because of the behaviour of double quotes - just try:

    print "|"; print "\|"; print "\\|";

    and you should see what's happening.


    s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
      Some "perlfunc" mention it:
      As a special case, specifying a PATTERN of space (' ') will split on white space just as split with no arguments does. Thus, split(' ') can be used to emulate awk's default behavior, whereas split(/ /) will give you as many null initial fields as there are leading spaces. A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field. A split with no arguments really does a split(' ', $_) internally.

      --
      $you = new YOU;
      honk() if $you->love(perl)

Re: Surprised by split
by jonadab (Parson) on Aug 11, 2005 at 10:12 UTC
    My last attempt was: split "\\|", $str; which generated the result I was after.

    This is known as the "lisp syntax" for regular expressions. The reason it's called that is because in lisp, there is no dedicated quoting mechanism for regular expressions, and so they have to be given in code as quoted strings. Any lisp manual will explain why the double-backslashing of everything is necessary.

    Perl does provide dedicated quoting constructs for regular expressions, and your regular expressions will be much easier to read and maintain if you use them, rather than giving your regular expressions in lisp syntax using regular double-quoted strings.


    "In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."  — Pratico & Van Pelt, BBHG, p68
Re: Surprised by split
by GrandFather (Saint) on Aug 11, 2005 at 10:17 UTC

    Thanks everyone. I have only occasionally used split and I guess I've not been bitten by the double quote quoting before.

    The lesson is well learned :)


    Perl is Huffman encoded by design.
Re: Surprised by split
by Anonymous Monk on Aug 11, 2005 at 14:05 UTC
    As explained by others, this is caused by repeated interpretation of the backslash. Or rather, lack of backslashes so it can be interpreted more than once. That's why I prefer to use [|] (and I think "Perl Best Practises" advices that too), then you can do both:
    split "[|]", $str;
    and
    split /[|]/, $str;
    without having to wonder how many backslashes you need.

    Anonymous Backslash Killer

      Sometimes I have to ask the user to specify the delimiter on the file and they might enter a pipe from the command line.

      This is how I was handling them -

      $opt_d = "\\|" if ($opt_d eq "|");

      Since i am just splitting the file based on just a string, I guess the /[$opt_d]/ would do the trick!

      Thanks Anonymous Monk

      -SK

        That would cause a problem if $opt_d equals ^. Perl seems to handle the cases /[-]/, /[[]/, /[]]/ as the appropriate single character class, but it trips on /[^]/. I wonder whether that's a bug (or a yet unimplemented feature).

        Of course, you can get all sorts of unexpected nonsense if $opt_d is longer than a single character. Or if it's the empty string.

        Wouldn't quotemeta avoid this problem, as well as the ones described below?
      Or you can use regular expression quotes when you're giving a regular expression. Using a character class might be helpful if you don't want to learn the language, but I would rather people just learn the language.
        Or you can use regular expression quotes when you're giving a regular expression.
        What do you mean by that? qr? Then you still need to escape the pipe.
        Using a character class might be helpful if you don't want to learn the language, but I would rather people just learn the language.
        What on earth do you mean by that?

      So you are suggesting that I should use a character class trick to avoid quoting a character that I should know needs to be quoted in a context I should know is a regex and that I should type an extra character to achieve this?

      I'm lazy and will take the good advice of others who have suggested that /\|/ is the way to do it.


      Perl is Huffman encoded by design.