Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Balancing Parens

by swiftone (Curate)
on Jun 01, 2000 at 21:42 UTC ( #15867=perlquestion: print w/replies, xml ) Need Help??
swiftone has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a parser for a specified format (so I'm stuck with the format). I have no doubt this will lead to many questions, but here's my first:

Given a string of comma separated elements, where an element can contain a function, and functions can have commas in their arguments, how do I best grab the elements?

After looking over Merlyn's nested C comment parser and The CSV parser from Mastering Regex, I have a working solution. I'm not convinced, however, that this is the easiest/best way to do it. Comments?

#!/usr/bin/perl $teststr="blah,blah(blah,blah(blah,blah(blah))),blah"; #This is three elements: # blah # blah(blah,blah(blah,blah(blah))) # blah # I don't have to worry about escaped parens, the file format forbids +it. foreach (&parse_comma($teststr)){ print "$_\n"; #This just proves that it works } sub parse_comma{ my $commastr=shift; my @tags; my $count=0; my $carrystr=""; foreach (split(/,/, $commastr)){ $_=$carrystr.",".$_ if $carrystr; $count=s/\(/(/g; $count-=s/\)/)/g; if($count){ $carrystr=$_; }else{ $carrystr=""; push @tags, $_; } } return @tags; }

Replies are listed 'Best First'.
Re: Balancing Parens
by lhoward (Vicar) on Jun 01, 2000 at 22:42 UTC
    Have you considered using Parse::RecDescent? It implements a full-featured recursive-descent parser. A real parser (as opposed to parsing a string with a regular expression alone) is much more powerful and can be more apropriate for parsing highly structured/nested data like you have. I'm not sure exactly what you want to do with the line after you parse it, so my example below does't do anything with the data it parses, but it should be a good starting point if you want to try using Parse::RecDescent to parse your data. (it has been a while since I've written a grammer so it may look a bit rough).
    use Parse::RecDescent; my $teststr="blah1,blah2(blah3,blah4(blah5,blah6(blah7))),blah8"; my $grammar = q { content: /[^\)\(\,]+/ function: content '(' list ')' value: content item: function | value list: item ',' list | item startrule: list }; my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n"; defined $parser->startrule($teststr) or print "Bad text!\n";
      Simplifying the grammar, we get:
      use Parse::RecDescent; my $teststr="blah1,blah2(blah3,blah4(blah5,blah6(blah7))),blah8"; my $grammar = q { list: <leftop: item ',' item> item: word '(' list ')' <commit> | word word: /\w+/ }; my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n"; + defined $parser->list($teststr) or print "Bad text!\n";

      -- Randal L. Schwartz, Perl hacker

      Thank you, this appears to be just what I was looking for. It may not be more efficient for this first part, but it looks like it can do 90% of the parsing (of the entire format, not just this one part) for me. I've never worked with yacc-like parsers, so this will be a new experiment for me. Once again, thanks!
        If you've never worked with parsers swifie, check out the antipodean wizard Damian Conway's article in TPJ on Parse::RecDecent entitled The man(1) of descent. At 13 pages, this must be the longest article ever in TPJ!
Re: Balancing Parens
by Anonymous Monk on Aug 17, 2000 at 10:12 UTC
    $_ = "blah,blah(blah,blah(blah,blah(blah))),blah";
    #$_="blah1,blah2(blah3,blah4(blah5,blah6(blah7))),blah8";
    ($re=$_)=~s/((\()|(\))|.)/$2\Q$1\E$3/gs;
    @$ = (eval{/$re/});
    die $@ if $@=~/unmatched/;
    $re = join'|',map{quotemeta}@$;
    print join"\n",/((?:$re|[^,])+)/g;
    
Re: Balancing Parens
by KM (Priest) on Jun 01, 2000 at 22:25 UTC
    Well, I don't know what the real data may look like, but this works for me with your $teststr:

    $teststr="blah,blah(blah,blah(blah,blah(blah))),blah"; if ($teststr =~ /^(\w*),(.*?),(\w*)$/) { print "1: $1\n2: $2\n3: $3\n"; }

    Cheers,
    KM

      Ah, I should have been more specific. The real data can have a variable number of elements. Thanks anyway.
        Well, be more specific. Show examples of the actual possible data, no pseudo-data that won't look like the actual data. Give us some test cases.

        Cheers,
        KM

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://15867]
Approved by root
help
Chatterbox?
[Corion]: At least my (non-SELinux) Debian has that config thing set. I don't have non-Debian machines handy (except Android)
[Corion]: My Android phone also has /proc/self/ loginuid, but that displays -1 (resp. 4GB). That might be because the phone is rooted.
[tye]: -1 means nobody logged in or the process was started before audit got booted
[davido]: ok, on my ubuntu system getlogin grabs from /proc/self/ loginuid (per strace)
[tye]: disable /proc and then see what it does?
[davido]: then it reads from /etc/passwd to decide who my uid is.
[davido]: sorry, typed that before you asked me to disable proc
[davido]: but you stumped me; don't know how to disable proc.
[tye]: I don't know if you can just dismount, but I thought so.

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2017-06-23 20:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many monitors do you use while coding?















    Results (554 votes). Check out past polls.