Contributed by Anonymous Monk
on Jun 07, 2000 at 03:27 UTC
Q&A
> regular expressions
Description: How do I split on complicated possiblities?
For instance, "a,b,op2(c,d),op3(e,op4(f,g))" should be
split into
a
b
op2(c,d)
op3(e,op4(f,g))
where c,d,e,f,g may be of the same form op(x,y).
In other words, I only want the split to occur at the
highest level of commas and ignore the nested ones. Answer: How do I split a string on highly structured/nested data? contributed by lhoward One approach is to use a parser like
Parse::RecDescent. A real parser
(as opposed to parsing a string with a regular
expression alone) is much more powerful and
can be more apropriate for parsing highly
structured/nested data like your example.
use Parse::RecDescent;
my $teststr="a,b,op2(c,d),op3(e,op4(f,g))";
my $grammar = q {
content: /[^\)\(\,]+/
function: content '(' list ')'
value: content
item: function | value
list: item ',' list | item
startrule: list
};
my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";
defined $parser->startrule($teststr) or print "Bad text!\n";
For other approaches see the discussion on
Balancing Parens. | Answer: How do I split a string on highly structured/nested data? contributed by Anonymous Monk This should work more reliably in case there are repeated strings:
$_ = "a,b,op2(c,d),(e),(f),(f,g),op3(e,op4(f,g))\n";
print;
($re=$_)=~s/((\()|(\))|.)/$2\Q$1\E$3/gs;
@$ = (eval{/$re/});
die $@ if $@=~/unmatched/;
$re = join'|',map{quotemeta}@$;
print join"\n",/((?:$re|[^,])+)/g;
| Answer: How do I split a string on highly structured/nested data? contributed by merlyn lhoward's grammar seems unnecessarily complicated. Let's simplify it a bit, as well as grabbing what is needed for the answer (the split items):
use Parse::RecDescent;
my $teststr="a,b,op2(c,d),op3(e,op4(f,g))";
my $grammar = q {
startrule: list
list: <leftop: item ',' item>
item: word '(' list ')' <commit> { "$item[1](".join(",",@{$item[3]})."
+)" }
| word
word: /\w+/
};
my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";
+
defined (my $result = $parser->startrule($teststr)) or print "Bad text
+!\n";
print map "<<< $_ >>>\n", @$result;
Yes, there it is. $result is an array ref of the split-apart items. |
Please (register and) log in if you wish to add an answer
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|