Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Here be dragons. :-)

The fundamental problem here lies in the differing NFA/DFA semantics. No, there is no easy way to convert one to the other.

As Mastering Regular Expressions explains so well, NFAs are a brute-force recursive search. If there is a combinatorial explosion of possible ways to start trying to match pattern to text (think about /^(\s*yada\s*)+$/ for a second) you get a situation where an NFA will hit exponential failure modes. (No, Perl didn't fail to see that there was no match, you just didn't let it work for enough years.) But since an NFA is a brute force search, just adjusting the search pattern lets you specify which alternatives come first, and whether this wildcard pattern is greedy while that one is not. Plus things like backreferences are easy to implement.

By contrast DFAs walk the string once, keeping track of all of the places they might be in the pattern at once. This is guaranteed to finish quickly, but good luck if you want to keep track of things like backreferences. (Well actually there are quadratic algorithms that will capture backreferences, but you cannot use them within the pattern.) By an implementation quirk, a DFA will always find the longest overall match. (They can also be designed the find the shortest, the Unix grep utility does this for speed.)

So NFAs and DFAs find different matches by default. NFAs offer more features and finer control. DFAs better guaranteed performance. If you understand all of that then you should be able to follow this explanation of the dangers that lie in wait for someone attempting a native conversion of flex to using Perl's RE engine.

Now in addition to the other ideas you have had (most of which will be workable, but will involve some learning) if you don't mind trying something which is possibly insane you could investigate whether Inline makes it feasible to use the Flex specification directly in Perl...

UPDATE
danger pointed out that I had said that a DFA was a brute force search when I clearly meant that an NFA was. Fixed.


In reply to Re (tilly) 1: Flex, converting regexes, and other Interesting Stuff. by tilly
in thread Flex, converting regexes, and other Interesting Stuff. by Petruchio

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-23 18:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found