Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

OS X vs. My Regex

by Pedro Picasso (Sexton)
on Apr 16, 2003 at 17:58 UTC ( #250980=perlquestion: print w/replies, xml ) Need Help??

Pedro Picasso has asked for the wisdom of the Perl Monks concerning the following question:

I've got this GUI encyclopedia builder thing in Tk, and it's mega-cool, but when I run it on Apple's OS X, one of the regular expressions segfaults my application. Is this a problem with the Apple port of Perl? I run this under 5.6.0 and 5.8.0 and get a segmentation fault on each. Is my expression flawed?

Below I'm posting the code that crashes. It's a huge string fed into a relatively large expression. The string has to be sufficiently long and include something enclosed in square braces to cause a problem. Sorry if this looks ugly:

$text = "\n Prefetching [sets] WebObjects to fetch a related set o +f EOs when fetching a specified set of EOs. This can create fewer fe +tches and free up database time for other applications.\n For instan +ce, let's say that you have an Organization EO that had several Offic +e EOs that each had an Address EO. Now let's say that you had a comp +onent which listed all of the Offices and Office Addresses for a spec +ific Organization. Usually, you would fetch the Offices for an Organ +ization and each repeating table row would try to fault each and ever +y \"address\" relationship in each Office. That creates a fetch for +each relationship fault, and a lot of round trips to the database whi +ch use a lot of resources and makes your thing slow.\n The solu +tion is to tell WebObjects that after you fetch the Offices (using a +single fetch), you would like to fetch all of their addresses (using +another single fetch). This can be done with prefetching. Here we g +o:\n\n EOFetchSpecification spec = ...;\n String keypaths +() = {\"offices\", \"offices.address\"}; // \"offices\" is a relation +ship in Organization, and \"address\" is a relationship in Office.\n + spec.setPrefetchingRelationshipKeyPaths(new NSArray(keypaths));" +; if ($text =~ /^ ( # Any amount of: ( \[ # An open brace .*? # Any amount of non-brace stuff \] # A close brace ) | # Or [^\[] # Anything that's not an open brac +e. )* \] # Followed by a close brace /xs ) { print "The article contains a closing brace \"]\" with no opening br +ace."; } else {print "Okay, joe.";}

Thanks a bunch for looking at this. I'm going to be posting some cool Tk examples from my project in the coming weeks.

-the Pedro Picasso
(sourceCode == freeSpeech)

Edit by dws to clean up formatting

Replies are listed 'Best First'.
Re: OS X vs. My Regex
by tall_man (Parson) on Apr 16, 2003 at 19:04 UTC
    You are making the regular expression engine do a lot of extra work by:

    1) Using capturing parenthesis instead of noncapturing ones .
    2) Even a non-greedy dot-star is bad (see Death to Dot Star!). Use a negated character class like [^\]]* instead.
    3) Excessive going in and out from parenthesis levels, as in the [^\[] expression following the "|", which ought to have a "+" after it.

    These inefficiencies are probably causing the regular expression engine to go wild and run out of memory. By the way, why not use module Text:Balanced instead?

    Update: Here is a brushed-up version with my suggestions added:

    if ($text =~ /^ (?: # Any amount of: (?: \[ # An open brace [^\]]* # Any amount of non-brace stuff \] # A close brace ) | # Or [^\[]+ # Anything that's not an open brac +e. )* \] # Followed by a close brace /xs

    Update2: After recommending Text::Balanced for this problem, I decided to try it for myself. I would have expected the following code to work, but it doesn't (I get the error: and it does work as long as you test for the following message as an acceptable case: "Did not find opening bracket after prefix: "[^\[]*", detected at offset 1216". The message just means you have passed all the brackets, which is fine.

    my $next; while ( $next = (extract_bracketed($text,'[]','[^\[]*'))[0] ) { print "found matching brackets: *$next*\n"; } print "found bracket error: $@\n" if $@;
      Thanks. Your example was helpful, and I'd never heard of Text::Balanced before. My expressions will not be such hogs in the future.
      -the Pedro Picasso
      (sourceCode == freeSpeech)
Re: OS X vs. My Regex
by Elian (Parson) on Apr 16, 2003 at 20:11 UTC
    OS X's default stack size is a bit small relative to other systems, so you're probably just banging into that. The Ruby folks ran into this a while back, and there's discussion of it here

      Thanks for the link. Knowing about the stack size problem, and how to get around it will be very helpful in the future.

      Boy those Ruby guys are everywhere.

      -the Pedro Picasso
      (sourceCode == freeSpeech)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://250980]
Approved by Enlil
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2023-11-28 13:53 GMT
Find Nodes?
    Voting Booth?

    No recent polls found