OS X vs. My Regex

Pedro Picasso has asked for the wisdom of the Perl Monks concerning the following question:

I've got this GUI encyclopedia builder thing in Tk, and it's mega-cool, but when I run it on Apple's OS X, one of the regular expressions segfaults my application. Is this a problem with the Apple port of Perl? I run this under 5.6.0 and 5.8.0 and get a segmentation fault on each. Is my expression flawed?

Below I'm posting the code that crashes. It's a huge string fed into a relatively large expression. The string has to be sufficiently long and include something enclosed in square braces to cause a problem. Sorry if this looks ugly:

$text = "\n     Prefetching [sets] WebObjects to fetch a related set o
+f EOs when fetching a specified set of EOs.  This can create fewer fe
+tches and free up database time for other applications.\n  For instan
+ce, let's say that you have an Organization EO that had several Offic
+e EOs that each had an Address EO.  Now let's say that you had a comp
+onent which listed all of the Offices and Office Addresses for a spec
+ific Organization.  Usually, you would fetch the Offices for an Organ
+ization and each repeating table row would try to fault each and ever
+y \"address\" relationship in each Office.  That creates a fetch for 
+each relationship fault, and a lot of round trips to the database whi
+ch use a lot of resources and makes your thing slow.\n       The solu
+tion is to tell WebObjects that after you fetch the Offices (using a 
+single fetch), you would like to fetch all of their addresses (using 
+another single fetch).  This can be done with prefetching.  Here we g
+o:\n\n        EOFetchSpecification spec = ...;\n      String keypaths
+() = {\"offices\", \"offices.address\"}; // \"offices\" is a relation
+ship in Organization, and \"address\" is a relationship in Office.\n 
+     spec.setPrefetchingRelationshipKeyPaths(new NSArray(keypaths));"
+;


if ($text =~  /^
                (                   # Any amount of:
                  (
                    \[              # An open brace
                    .*?             # Any amount of non-brace stuff
                    \]              # A close brace
                  )
                  |                 # Or
                  [^\[]             # Anything that's not an open brac
+e.
                 )*
                 \]                 # Followed by a close brace
              /xs
) {
  print "The article contains a closing brace \"]\" with no opening br
+ace.";
} else {print "Okay, joe.";}
[download]

Thanks a bunch for looking at this. I'm going to be posting some cool Tk examples from my project in the coming weeks.

-the Pedro Picasso
(sourceCode == freeSpeech)

Edit by dws to clean up formatting

Comment on OS X vs. My Regex Download Code

Replies are listed 'Best First'.
Re: OS X vs. My Regex by Elian (Parson) on Apr 16, 2003 at 20:11 UTC
OS X's default stack size is a bit small relative to other systems, so you're probably just banging into that. The Ruby folks ran into this a while back, and there's discussion of it here	[reply]
Re: Re: OS X vs. My Regex by Pedro Picasso (Sexton) on Apr 17, 2003 at 15:59 UTC
Thanks for the link. Knowing about the stack size problem, and how to get around it will be very helpful in the future. Boy those Ruby guys are everywhere. -the Pedro Picasso (sourceCode == freeSpeech)	[reply]
Re: OS X vs. My Regex by tall_man (Parson) on Apr 16, 2003 at 19:04 UTC
You are making the regular expression engine do a lot of extra work by: 1) Using capturing parenthesis instead of noncapturing ones . 2) Even a non-greedy dot-star is bad (see Death to Dot Star!). Use a negated character class like `[^\]]` instead. 3) Excessive going in and out from parenthesis levels, as in the `[^\[]` expression following the "\|", which ought to have a "+" after it. These inefficiencies are probably causing the regular expression engine to go wild and run out of memory. By the way, why not use module Text:Balanced instead? Update:* Here is a brushed-up version with my suggestions added: `if ($text =~ /^ (?: # Any amount of: (?: \[ # An open brace [^\]]* # Any amount of non-brace stuff \] # A close brace ) \| # Or [^\[]+ # Anything that's not an open brac +e. )* \] # Followed by a close brace /xs` [download] Update2: After recommending Text::Balanced for this problem, I decided to try it for myself. ~~I would have expected the following code to work, but it doesn't (I get the error:~~ and it does work as long as you test for the following message as an acceptable case: `"Did not find opening bracket after prefix: "[^\[]", detected at offset 1216"`. The message just means you have passed all the brackets, which is fine. `my $next; while ( $next = (extract_bracketed($text,'[]','[^\[]'))[0] ) { print "found matching brackets: $next\n"; } print "found bracket error: $@\n" if $@;` [download]	[reply] [d/l] [select]
Re: Re: OS X vs. My Regex by Pedro Picasso (Sexton) on Apr 17, 2003 at 15:57 UTC
Thanks. Your example was helpful, and I'd never heard of Text::Balanced before. My expressions will not be such hogs in the future. -the Pedro Picasso (sourceCode == freeSpeech)	[reply]


The stupid question is the question not asked
	PerlMonks