Re: Complex Splitting
by holli (Abbot) on Feb 06, 2007 at 13:40 UTC
|
push @array, $2 ? $2 : $3
while $str =~ /(([A-Z])|\[([A-Z]+)\])/g;
| [reply] [d/l] |
|
push @array, $1
while $str =~ /(?^(\w+)|\[(\w+)\])/g;
| [reply] [d/l] |
Re: Complex Splitting
by rinceWind (Monsignor) on Feb 06, 2007 at 13:40 UTC
|
@arr = $str =~ /(\w|\[\w+\])/g;
Update: thanks to monarch for pointing this out. The above leaves the brackets behind. One option is to split the capture and filter out the undefs thus:
@arr = grep $_, $str =~ /(\w)|\[(\w+)\]/g;
--
Oh Lord, won’t you burn me a Knoppix CD ?
My friends all rate Windows, I must disagree.
Your powers of persuasion will set them all free,
So oh Lord, won’t you burn me a Knoppix CD ? (Missquoting Janis Joplin)
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
Re: Complex Splitting
by johngg (Canon) on Feb 06, 2007 at 14:19 UTC
|
You can do this with a split but the pattern is a lot trickier than that for the global match. Advantage is you don't have to worry about grep'ing out undefined values.
use strict;
use warnings;
use Data::Dumper;
my $str = q{ABC[GHI]XYZ};
my $rxSplit = qr
{(?x) # use extended syntax
(?: # non-capturing group
\[ # literal opening square br.
| # or
\] # literal closing square br.
| # or
(?= # look-ahead, a point
# followed by
[^]]+ # one or more non-closing
# square brs.
(?: # non-capturing group
# then either
\[ # literal opening square br.
| # or
\z # end of string
) # close non-capturing group
) # close look-ahead
) # close non-capturing group
};
my @array = split m{$rxSplit}, $str;
print Data::Dumper->Dump([\@array], [qw{*array}]);
Here's the output.
@array = (
'A',
'B',
'C',
'GHI',
'X',
'Y',
'Z'
);
On balance, I'd go with the global match, something like holli's or eric256's solutions.
Cheers, JohnGG | [reply] [d/l] [select] |
|
I don't think it's a lot trickier. This oughta do it:
@array = grep defined, split /(\[.*?\])|/;
That is, if you split on a bracketed group, capture it. If you don't have a bracketed group, split on nothing. Then filter out where the capture didn't match.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
|
Yes, I regretted the word "lot" as soon as I'd posted, made it sound as if I'd sweated blood writing it. Although I like the simpler approach of your solution there is one problem with it; it doesn't get rid of the square brackets, which was required by the OP.Cheers, JohnGG
| [reply] |
|
Re: Complex Splitting - Parse::RecDescent
by imp (Priest) on Feb 06, 2007 at 15:02 UTC
|
use Parse::RecDescent;
use strict;
use warnings;
my $str = "ABC[GHI]XY[Z]1A";
my $grammar = <<'GRAMMAR';
token : '[' /[A-Z]*/ ']' {$return = $item[2]}
| /[A-Z]/
anything : /./
GRAMMAR
my $parser = Parse::RecDescent->new($grammar);
# When a reference to a scalar is passed to Parse::RecDescent it will
# consume the tokens as they are matched. To avoid modifying the origi
+nal
# string a copy will be used
my $copy = $str;
while ($copy ne '') {
if (my $token = $parser->token(\$copy)) {
print "Token: $token\n";
}
else {
my $token = $parser->anything(\$copy);
print "Invalid symbol: $token\n";
}
}
| [reply] [d/l] |
|
That silently ignores whitespace (read up on <skip>).
Also, P::RD is rather slow. I'd even say inexcusably slow if you're just using it as a tokenizer. May I suggest a much faster tokenizer?
use strict;
use warnings;
sub process_token {
my ($token) = @_;
print("Token: $token\n");
}
{
my $str = "ABC[GHI]XY[Z]1A";
for ($str) {
/\G \[ ([A-Z]*) \] /xgcs && do {
process_token("$1");
redo
};
/\G ([A-Z]) /xgcs && do {
process_token("$1");
redo
};
/\G (.) /xgcs && do {
printf("Unexpected '%s' at pos %d\n", $1, pos()-length($1));
redo
};
}
}
| [reply] [d/l] [select] |
Re: Complex Splitting
by tcf03 (Deacon) on Feb 06, 2007 at 14:03 UTC
|
This should do the trick - Its only tested on your string though...
#!/usr/bin/perl
use strict;
use warnings;
my $string = "ABC[GHI]XYZ";
my @array;
splitstring($string);
push @array, split//, $string;
print "$_ " for sort @array;
sub splitstring
{
return unless ( $string =~ /.*\[\w+\].*/ );
if ( $string =~ /.*(\[(\w+)\]).*/ )
{
my $remove = $1;
my $str = $2;
print "rem = $remove\n str = $str\n";
push @array, $str;
$string =~ s/\Q$remove\E//;
}
splitstring($string);
}
UPDATE
I substututed the string with [987]ABC[GHI]XYZ[dfg] and that works as well.
regards
Ted
--
"That which we persist in doing becomes easier, not that the task itself has become easier, but that our ability to perform it has improved."
--Ralph Waldo Emerson
| [reply] [d/l] [select] |
Re: Complex Splitting
by shmem (Chancellor) on Feb 06, 2007 at 14:57 UTC
|
perl -le '$_="[987]ABC[GHI]XYZ[dfg]"; print "$_" for split/(?!\w+\])|\
+[|]/,$_'
but that spits out extra fields. Let's add a grep:
perl -le '$_="[987]ABC[GHI]XYZ[dfg]";print "$_" for grep{$_}split/(?!\
+w+\])|\[|]/,$_'
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] [d/l] [select] |
Re: Complex Splitting
by fenLisesi (Priest) on Feb 06, 2007 at 17:53 UTC
|
Here is another attempt at expanding a one-liner into a big script, this one using a canned /regexp?/i:
use strict;
use warnings;
use Regexp::Common qw(balanced);
my @streams = (qw(
[ABC][DEF][HIJ]
ABC[DEF]HI[JK][LMNO]PQ
[A][B]C[DEF]HI[JK][LMNO]PQ
[A][B]C[DEF]HI[JK][LMNO]P[Q]
[ABCDEFHIJKLMNOPQ]
ABCDEF
), q());
my $PATTERN = $RE{balanced}{-parens=>'[]'};
for my $input (@streams) {
print qq($input => );
my @pieces = ();
while ($input =~ s/(\w*)($PATTERN)//) {
my ($prematch, $match) = ($1, $2);
push @pieces, split //, $prematch;
push @pieces, $1 if $match =~ /(\w+)/;
}
push @pieces, split //, $input if length $input;
printf qq(%s\n), join qq( ), @pieces;
}
__END__
which prints:
[ABC][DEF][HIJ] => ABC DEF HIJ
ABC[DEF]HI[JK][LMNO]PQ => A B C DEF H I JK LMNO P Q
[A][B]C[DEF]HI[JK][LMNO]PQ => A B C DEF H I JK LMNO P Q
[A][B]C[DEF]HI[JK][LMNO]P[Q] => A B C DEF H I JK LMNO P Q
[ABCDEFHIJKLMNOPQ] => ABCDEFHIJKLMNOPQ
ABCDEF => A B C D E F
=>
Cheers. | [reply] [d/l] [select] |