Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

The Perl Regex Tester

by davido (Archbishop)
on Jul 03, 2012 at 18:15 UTC ( #979754=perlnews: print w/ replies, xml ) Need Help??

Over the years I've come across a number of websites that provide regex testing. But it always seems like I'm looking at a "Perl Compatible Regular Expression" through PHP, or some other language's goggles. And while some of them offer a slick interface, they usually only tell whether or not there was a match, and possibly what got captured. They all felt quirky.

So I set out to create my own quirky implementation, but in a way that I consider more useful and applicable to Perl users. The Perl Regex Tester (Github repo).

Update (4-26-2013): The live app has moved to Heroku, since the DotCloud "free sandbox" plan has been discontinued. Live site: The Perl Regex Tester.

While the interface may be a bit Spartan -- no ajax, no flash, no fuss, it works pretty well (at least in my skewed assessment). And it provides the following features:

  • A listing of all capture variables that apply: $<digits>, ${^PREMATCH}, @+, $+{name}, and so on.
  • When the /g modifier is set, the regex will be evaluated in list context, and the list returned will also be displayed.
  • The use re qw(debug) output is rendered, so you can see how the compilation of the regex progresses, and how "The Little Engine that Could('nt)" walks through the target string trying to match.
  • A "Link to this test" link is displayed following any test. This allows you to capture the current test and link to it here or anywhere that you're trying to provide some instructional tutoring. Example: Here's a link to a test, written here as [http://...the long url...|Here's a link to a test].
  • Quick links to Regex-relevant Perl POD.

The site is currently hosted in a dev account at Dotcloud, and consequently the URL is a little goofy. Someday it may move to a more friendly URL, but I"m taking a wait and see approach, as I'd like to be sure that I'm catching all the most significant pitfalls in executing user-supplied regexes first.

Some "gory details":

The site is hosted in a Perl service at Dotcloud, through a Sandbox (development--free) account. These accounts aren't designed to scale, and have no availability guarantees. However, they are generally quite reliable. I could upgrade to a "Live" account which has performance and reliability guarantees, but it's just a pet project. There are four processes, and the entire kit and kaboodle consumes up to about 90MB of VM RAM. It sits on top of Plack, which is interfacing with Nginx. The code consists of a 175 line Moo-based model class, and about 50 lines of Mojolicious::Lite code, plus Some Mojolicious templates and Twitter Bootstrap CSS, with a few additions.

Much of the model class is devoted to paranoia. Special variables that might be interpolated to discover things like environment variables are escaped in regexes before they're compiled. Compilation of regexes takes place in a Safe compartment, returning a Safe-compartmentalized Regex object. Matches are carried out in a Try::Tiny compartment so that fatal errors can be trapped. An alarm timeout is set so that crazy-inefficient regexes won't chew up too much server time. Capturing the debug info was made easier by using Capture::Tiny. And of course I'm operating under no re 'eval'; (Honorable mention, via an update to this node.) Modifiers are restricted to those that make sense in the contexts of this tool (which means I currently drop the /c modifier, if you ask for it).

The "Captures" section will display the capture variables that are defined for the current successful match. The "Debug" section will display regardless of whether the match was successful or not, making it a useful tool for figuring out what went wrong.

I would have liked to also display GraphViz2::Parse::Regex, but Dotcloud doesn't have the "graphviz" C libraries installed for it, and I figured I probably shouldn't press my luck with a free account. I also wanted YAPE::Regex::Explain, but for some reason in the context of a Mojolicious web application with Unicode_Strings enabled, it produces no output. And it's fairly outdated anyway.

This was originally written as a quick demonstration of how simple it can be to get something together quickly with Mojolicious, and pushed to Dotcloud. I'll place it in a Github repo in a few days and will follow-up here when that's done. The model layer is front-end-agnostic, so I could easily turn it into a command-line tool. If I get around to doing that I'll add YAPE::Regex::Explain support back in.

Please feel free to play around with it and use it. If you find a problem or want to request an additional feature, send me a message and I'll see what I can do.

Enjoy!


Dave

Comment on The Perl Regex Tester
Select or Download Code
Re: The Perl Regex Tester
by perl.j (Pilgrim) on Jul 03, 2012 at 20:03 UTC
    Awesome. Great. Sweet. Amazing.
    --perl.j
Re: The Perl Regex Tester
by Anonymous Monk on Jul 03, 2012 at 20:40 UTC

      I've read MJD's paper on Using (?{print}) for debugging, but it seems imprudent to enable use re qw(eval) in the context of executing user regexes. Since I'm not certain that I could sanitize the regex well enough to feel good about re 'eval', I'll have to pass on that technique.

      That Tk snippet was interesting. As for Re: validate a form field with regexp?, I could see allowing someone to use English to enter a regex, and then see what it looks like in "indistinguishable from line noise" format. I'll give that some thought.

      I thought my code prevented zero-length regexes, but not zero-length targets. I'll look into it and get that fixed. In fact, I should probably allow zero-length regexes too, but (?:) is essentially the same thing.

      Thanks for your input.

      Update:After looking at Your failed match, I am not sure that there's an issue. It seems to be correctly stating "No Match!", as (\b\w+) shouldn't match against an empty string. I could be misunderstanding.

      The linking feature was an afterthought, and I'm already glad I added it. I see that as being useful.


      Dave

        I could be misunderstanding.

        When I with my eyes look at the form it says "Target string", sure its a little greyed out, I expected the regex I typed to match against "Target string" not empty string :)

        Your Visual Regex Explorer link says:
        Uses YAPE::Regex::Explain to give detailed explanations of what any valid regex. (As long as it doesn't use any of the extensions added in perl 5.12 or later. That's a limitation of Y::R::E. If it gets updated, then so will VRegExp.)

        YAPE::Regex::Explain says:

        There is no support for regular expression syntax added after Perl version 5.6, particularly any constructs added in 5.10.
Re: The Perl Regex Tester
by moritz (Cardinal) on Jul 04, 2012 at 08:01 UTC

    When I enter a regex and target string that times out, I get a 404 instead of a proper error message.

    It would also be neat to have a more concise summary then the use re 'debug' output of why a regex match fails.

    Otherwise neatly done.

      I'll look at possibilities for summarizing match failures. That's a good idea.

      I've noticed the 404 issue in the server logs. I think it's because Safe is interfering with alarm. I'm beginning to think that Safe isn't worth the trouble, but those are probably fatal last words.


      Dave

Re: The Perl Regex Tester
by davido (Archbishop) on Jul 16, 2012 at 22:25 UTC

    The grossly inefficient regular expression problem:

    Originally in my code I was using alarm to time-out grossly inefficient regexes. This was effective, but too effective; while the alarm was timing out, I was unable to trap the exception it threw, because the exception was happening inside a pattern match atom; eval, and Try::Tiny would both miss it unless I installed a custom exception handler. And when I did that, the alarm wouldn't ever stop the match because Perl considers the match an unsafe time to jump into a custom handler.

    In versions without a custom $SIG{ALRM} handler, the alarm would go off, the process would die (because eval and Try::Tiny failed to trap it), and Dotcloud's supervisor would re-spawn the process. The end user would get a 404. This was the better of two evils; if a $SIG{ALRM} handler were installed the process wouldn't time out, and could run-away with a cleverly crafted regex. So up until recently, I took the first approach of letting it just die and respawn, to buy some time while I looked for a better alternative.

    The solution:

    The solution turned out to be to integrate Sys::SigAction into the timeout process. After some fiddling that worked out quite well. I was able to set a fairly short timeout, and instead of seeing a 404, the user who submits a particularly nasty regular expression gets a more friendly (though not terribly useful) experience.

    The repo:

    So now that I feel a little more comfortable with how exceptional circumstances are being handled, I guess it's time to follow through on my earlier commitment to make the code available. I've pushed it to a public Github repo at https://github.com/daoswald/retester.git.

    Dotcloud:

    I got a pleasant email from a representative at Dotcloud today thanking me for putting the Regex Tester online, and using Dotcloud to do it. Remember that this is a free "sandbox" app, so they're not making anything on it. I really put up the RE tester as a way to experiment with DC since I've been contemplating using them to deploy a larger private project I'm working on for a $client. Through this and other correspondence with them I'm learning that the folks at Dotcloud are pretty serious about making developers happy with their services. Their current pricing structure really opens the doors for developers to play with pet projects for free, presumably with the hope that as developers gain familiarity with the platform, paying business will follow. My experience with them so far has been good. One application I've been working on will deploy with a couple of web server instances (four processes each), and two database instances (with automatic replication). DC automatically places each instance in a different EC2 availability zone, and handles load balancing and automatic failover without any extra work on the part of the developer. Overall it's a bit more expensive than bare EC2, but they seem to provide enough benefits to justify the cost.


    Dave

      Just curious as to how all of this stuff works, at least hypothetically... if you had an app on dotcloud and threw an ad on the various clients, say web and the two major phone OSes, would the ad pay for the cost of serving it as it scaled?


      --Jimbus aka Jim Babcock
      Wireless Data Engineer and Geek Wannabe
      jim-dot-babcock-at-usa-dot-com

        I guess that depends on how much revenue the ad produces, and how much memory your app consumes.

        Dotcloud's pricing is based on how many horizontally scaled instances you need, and how much vertical memory scaling you need.

        The default configuration with Dotcloud is for there to be four processes running. So your 16MB app will consume 64MB. A single 64MB Perl service costs $8.64/month. The smallest vertical scale you can select is 32MB, so your application would have to consume no more than 8MB per process. There may be a way to configure fewer processes, but I haven't looked for it.

        Now for horizontal scaling: If you want your service running in a couple of instances on multiple availability zones, you can specify to horizontally scale to two instances. Now the $8.64 doubles to $17.28.

        The main reason for scaling vertically is to obtain more memory reserved for your application. As you scale vertically you also get more storage space, but that seems mostly an issue for database services.

        The main reason for scaling horizontally is reliability. Dotcloud automatically places each instance in a different availability zone with Amazon EC2. Dotcloud also handles automatic failover and load balancing. For database services that are horizontally scaled, you get automatic database replication as well as failover.

        The Perl Regex Tester is somewhat memory hungry. It uses Moo, Safe, Try::Tiny, Capture::Tiny, Sys::SigAction, Time::HiRes, and Carp within its model class. And in its application class it uses Mojolicious::Lite, which itself pulls in a lot of the greater Mojolicious framework. Plack is also in the stack. And then there's the big unknown: What creepy regex will people throw at it?

        I've enacted timeouts to limit the damage of a grossly inefficient regex, and also limited the amount of output from "use re 'debug'" that I capture. Both of those constraints reduce the potential for time DOS as well as memory consumption. I also limit the size in bytes of the regex and of the target string thrown at it. But still, I've seen the four processes jump to a total memory usage of around 256MB. I could probably clamp down a bit more on the timeouts to ensure that it stays below 224MB (remember, that's divided among four workers). A 224MB service that is not horizontally scaled would cost me $30.24/month. However, since I don't care about redundancy, and I don't need a custom domain name, and I don't really care about performance guarantees for this particular app, I just run it as a free sandbox application. If someday someone decides they want to support converting it to a live (paying) app, I could do that quickly, and I'm sure Dotcloud would be happy to accept payment. ;) By way of comparison, when I first start the application and throw a bunch of simple regular expressions at it, the memory footprint stays between 64MB and 96MB as a total for the four workers. In other words, if I had stricter enforcement of complexity limits, it could live happily on under $13/month.

        In your case if you're able to manage memory carefully you might fit into a 64mb instance and don't require horizontal scaling, you could keep your costs to under $9/month. That seems like it could reasonably be offset through advertising.


        Dave

Re: The Perl Regex Tester
by GlitchMr (Sexton) on Aug 21, 2012 at 10:58 UTC

    I've got 404 when trying to match aaaaaaaaaaaaaaaaaaaa against a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa.

      Uh.... neither aaaa... nor a?a?a?... looks much like a fully qualified www address... which makes me wonder if your "404" is intended to be merely figurative or if you really have no clue. -   -

      If figurative, please show errors message(s), verbatim.

      Update: Apologies. Memo to self: Read the full thread before commenting. (Sneaky suspicion: given moritz' reply and replies to that, same advice may apply to poster of parent node.)

Re: The Perl Regex Tester
by Anonymous Monk on Apr 18, 2013 at 09:58 UTC
    Hello David, Thanks a lot, but you definitely should make sure the "Freeing REx:" part in the debug window is returning only individual session values. I can see someone else data : "Freeing REx: "(?:/^(?=.*\bhttp\:\/\/es\.kvk\.nl\/KVK-DataserviceCT\/2012\/"... while the regex I run is \p{Han}{1}|\p{Katakana}{1}|\p{Hiragana}{1} Often harmless issue, but who knows what your otherwise great Regex Tester can be used for? Cheers!

      I'm not seeing it. Is it possible you're using a computer that already has a session cookie from someone else accessing the same site? Mojolicious is pretty good about not leaking, and no data is actually stored anywhere on the server.

      Unfortunately I've learned that dotCloud will be abandoning their "sandboxes are free" policy. I don't think there's any viable business model there, so it might disappear when they pull the plug on sandbox apps. ...or I could move it to heroku, but then spreading the word on the change is difficult.


      Dave

Re: The Perl Regex Tester
by davido (Archbishop) on Apr 27, 2013 at 03:19 UTC

    Just an update: As of April 24, 2013, dotCloud pulled the plug on their "free sandbox applications" policy. On the other hand, Heroku still allows single-dyno (whatever that is) applications to run free, so the Perl Regex Tester has moved to Heroku. Its URL is: http://retester.herokuapp.com. Enjoy (while it lasts).

    Heroku provides 750 dyno-hours per month free of charge (for now). There are 744 hours in the typical 31-day month, so that means as long as I'm just running a single process, and as long as Heroku doesn't eliminate that structure, the Perl Regex Tester can live there free of charge.

    If anyone can think of an organization that might want to support the Perl Regex Tester, possibly by hosting it long-term, please let me know.


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlnews [id://979754]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2014-07-30 05:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls