Re: Surprised by split
by Corion (Patriarch) on Aug 11, 2005 at 09:59 UTC
|
split takes a regular expression as its first argument, and not a string. What tripped you off was the double-interpolation that occurs when using string literals as regular expressions - just don't do that, use regular expressions instead:
split /\|/, $str;
| [reply] [d/l] |
Re: Surprised by split
by dave_the_m (Monsignor) on Aug 11, 2005 at 10:01 UTC
|
You are using a string literal rather than a regular expression literal, so the \ gets removed by the double-quoted string handling code before it gets passed to the regex compiler. Use a regex literal instead:
$ perl -wle 'print join ",", split /\|/, "abc|def"'
abc,def
$
Dave. | [reply] [d/l] |
Re: Surprised by split
by salva (Canon) on Aug 11, 2005 at 09:59 UTC
|
the problem is that you are quoting the string with double quotes and so, scaped chars are unscaped before the string is passed to split. Use // or '' instead:
split /\|/, 'foo|bar';
| [reply] [d/l] [select] |
Re: Surprised by split
by gellyfish (Monsignor) on Aug 11, 2005 at 10:03 UTC
|
You are getting stuffed up by the quoting here. You should bear in mind in the first instance that the first argument to split is a regular expression, and it is often prefered use the // rather than quotes. In your first attempt you are asking to split on anything or anything ('|' being the regex alternation character), in your second you are basically getting the same thing because the '\' is being eaten in the double quoted context (i.e. escaping the '|') and the third is correct because you are now escaping the '\' so you get a literal '\|'. You probably want to either use single quotes or the // for clarity.
use warnings;
use strict;
+
my $str = "lhs|rhs";
+
print "\n" . join " ", split '\|', $str;
print "\n" . join " ", split /\|/, $str;
/J\ | [reply] [d/l] [select] |
Re: Surprised by split
by davorg (Chancellor) on Aug 11, 2005 at 10:06 UTC
|
The '\' has a special meaning in a double-quoted string. And _also_ a special meaning in a regex. You'll need two of them in order to break through both special meanings.
But the first argument to "split" should be a regex. So don't pass it a string :)
See... now I'm wondering about split qr(\|) and split m/\|/.
--
< http://dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] |
|
$ perl -e '$a="m|b|a"; print join(",",split "\\|",$a),$/'
m,b,a
$ perl -e '$a="m|b|a"; print join(",",split /\|/,$a),$/'
m,b,a
$ perl -e '$a="m|b|a"; print join(",",split m/\|/,$a),$/'
m,b,a
$ perl -e '$a="m|b|a"; print join(",",split qr/\|/,$a),$/'
m,b,a
$ perl -e '$a="m|b|a"; $b=qr/\|/; print join(",",split $b,$a),$/'
m,b,a
$ perl -e '$a="m|b|a"; sub b{return qr/\|/}; print join(",",split b(),
+$a),$/'
m,b,a
$ perl -e '$a="m|b|a"; print join(",",split "[|]",$a),$/'
m,b,a
Man, I love perl. :) (Updated a couple of times to add new variations)
--
$you = new YOU;
honk() if $you->love(perl)
| [reply] [d/l] |
Re: Surprised by split
by muntfish (Chaplain) on Aug 11, 2005 at 10:01 UTC
|
I think split behaves differently if the first parameter is a normal string instead of a regex. The perlfunc documentation doesn't mention this and I can't remember how exactly it differs. Anyway since | is a regex special character (alternation) you need to escape it. I'd normally write this as:
split /\|/, $str
The reason you need two \s is because of the behaviour of double quotes - just try:
print "|";
print "\|";
print "\\|";
and you should see what's happening.
s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
| [reply] [d/l] [select] |
|
Some "perlfunc" mention it:
As a special case, specifying a PATTERN of space (' ') will split on white space just as split with no arguments does. Thus, split(' ') can be used to emulate awk's default behavior, whereas split(/ /) will give you as many null initial fields as there are leading spaces. A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field. A split with no arguments really does a split(' ', $_) internally.
--
$you = new YOU;
honk() if $you->love(perl)
| [reply] |
Re: Surprised by split
by jonadab (Parson) on Aug 11, 2005 at 10:12 UTC
|
My last attempt was:
split "\\|", $str;
which generated the result I was after.
This is known as the "lisp syntax" for regular
expressions. The reason it's called that is because
in lisp, there is no dedicated quoting mechanism for
regular expressions, and so they have to be given in
code as quoted strings. Any lisp manual will explain
why the double-backslashing of everything is
necessary.
Perl does provide dedicated
quoting constructs for regular expressions, and your
regular expressions will be much easier to read and
maintain if you use them, rather than giving your
regular expressions in lisp syntax using regular
double-quoted strings.
"In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."
— Pratico & Van Pelt, BBHG, p68
| [reply] [d/l] |
Re: Surprised by split
by GrandFather (Saint) on Aug 11, 2005 at 10:17 UTC
|
Thanks everyone. I have only occasionally used split and I guess I've not been bitten by the double quote quoting before.
The lesson is well learned :)
Perl is Huffman encoded by design.
| [reply] |
Re: Surprised by split
by Anonymous Monk on Aug 11, 2005 at 14:05 UTC
|
As explained by others, this is caused by repeated interpretation of the backslash. Or rather, lack of backslashes so it can be interpreted more than once. That's why I prefer to use [|] (and I think "Perl Best Practises" advices that too), then you can do both:
split "[|]", $str;
and
split /[|]/, $str;
without having to wonder how many backslashes you need.
Anonymous Backslash Killer | [reply] [d/l] [select] |
|
$opt_d = "\\|" if ($opt_d eq "|");
Since i am just splitting the file based on just a string, I guess the /[$opt_d]/ would do the trick! Thanks Anonymous Monk
-SK | [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
|
Wouldn't quotemeta avoid this problem, as well as the ones described below?
| [reply] |
|
|
Or you can use regular expression quotes when you're giving a regular expression. Using a character class might be helpful if you don't want to learn the language, but I would rather people just learn the language.
| [reply] |
|
Or you can use regular expression quotes when you're giving a regular expression.
What do you mean by that? qr? Then you still need to escape the pipe.
Using a character class might be helpful if you don't want to learn the language, but I would rather people just learn the language.
What on earth do you mean by that?
| [reply] |
|
|
|
So you are suggesting that I should use a character class trick to avoid quoting a character that I should know needs to be quoted in a context I should know is a regex and that I should type an extra character to achieve this?
I'm lazy and will take the good advice of others who have suggested that /\|/ is the way to do it.
Perl is Huffman encoded by design.
| [reply] [d/l] |