Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: The Perl Regex Tester

by davido (Archbishop)
on Jul 16, 2012 at 22:25 UTC ( #982090=note: print w/ replies, xml ) Need Help??


in reply to The Perl Regex Tester

The grossly inefficient regular expression problem:

Originally in my code I was using alarm to time-out grossly inefficient regexes. This was effective, but too effective; while the alarm was timing out, I was unable to trap the exception it threw, because the exception was happening inside a pattern match atom; eval, and Try::Tiny would both miss it unless I installed a custom exception handler. And when I did that, the alarm wouldn't ever stop the match because Perl considers the match an unsafe time to jump into a custom handler.

In versions without a custom $SIG{ALRM} handler, the alarm would go off, the process would die (because eval and Try::Tiny failed to trap it), and Dotcloud's supervisor would re-spawn the process. The end user would get a 404. This was the better of two evils; if a $SIG{ALRM} handler were installed the process wouldn't time out, and could run-away with a cleverly crafted regex. So up until recently, I took the first approach of letting it just die and respawn, to buy some time while I looked for a better alternative.

The solution:

The solution turned out to be to integrate Sys::SigAction into the timeout process. After some fiddling that worked out quite well. I was able to set a fairly short timeout, and instead of seeing a 404, the user who submits a particularly nasty regular expression gets a more friendly (though not terribly useful) experience.

The repo:

So now that I feel a little more comfortable with how exceptional circumstances are being handled, I guess it's time to follow through on my earlier commitment to make the code available. I've pushed it to a public Github repo at https://github.com/daoswald/retester.git.

Dotcloud:

I got a pleasant email from a representative at Dotcloud today thanking me for putting the Regex Tester online, and using Dotcloud to do it. Remember that this is a free "sandbox" app, so they're not making anything on it. I really put up the RE tester as a way to experiment with DC since I've been contemplating using them to deploy a larger private project I'm working on for a $client. Through this and other correspondence with them I'm learning that the folks at Dotcloud are pretty serious about making developers happy with their services. Their current pricing structure really opens the doors for developers to play with pet projects for free, presumably with the hope that as developers gain familiarity with the platform, paying business will follow. My experience with them so far has been good. One application I've been working on will deploy with a couple of web server instances (four processes each), and two database instances (with automatic replication). DC automatically places each instance in a different EC2 availability zone, and handles load balancing and automatic failover without any extra work on the part of the developer. Overall it's a bit more expensive than bare EC2, but they seem to provide enough benefits to justify the cost.


Dave


Comment on Re: The Perl Regex Tester
Select or Download Code
Re^2: The Perl Regex Tester
by jimbus (Friar) on Jul 24, 2012 at 17:37 UTC

    Just curious as to how all of this stuff works, at least hypothetically... if you had an app on dotcloud and threw an ad on the various clients, say web and the two major phone OSes, would the ad pay for the cost of serving it as it scaled?


    --Jimbus aka Jim Babcock
    Wireless Data Engineer and Geek Wannabe
    jim-dot-babcock-at-usa-dot-com

      I guess that depends on how much revenue the ad produces, and how much memory your app consumes.

      Dotcloud's pricing is based on how many horizontally scaled instances you need, and how much vertical memory scaling you need.

      The default configuration with Dotcloud is for there to be four processes running. So your 16MB app will consume 64MB. A single 64MB Perl service costs $8.64/month. The smallest vertical scale you can select is 32MB, so your application would have to consume no more than 8MB per process. There may be a way to configure fewer processes, but I haven't looked for it.

      Now for horizontal scaling: If you want your service running in a couple of instances on multiple availability zones, you can specify to horizontally scale to two instances. Now the $8.64 doubles to $17.28.

      The main reason for scaling vertically is to obtain more memory reserved for your application. As you scale vertically you also get more storage space, but that seems mostly an issue for database services.

      The main reason for scaling horizontally is reliability. Dotcloud automatically places each instance in a different availability zone with Amazon EC2. Dotcloud also handles automatic failover and load balancing. For database services that are horizontally scaled, you get automatic database replication as well as failover.

      The Perl Regex Tester is somewhat memory hungry. It uses Moo, Safe, Try::Tiny, Capture::Tiny, Sys::SigAction, Time::HiRes, and Carp within its model class. And in its application class it uses Mojolicious::Lite, which itself pulls in a lot of the greater Mojolicious framework. Plack is also in the stack. And then there's the big unknown: What creepy regex will people throw at it?

      I've enacted timeouts to limit the damage of a grossly inefficient regex, and also limited the amount of output from "use re 'debug'" that I capture. Both of those constraints reduce the potential for time DOS as well as memory consumption. I also limit the size in bytes of the regex and of the target string thrown at it. But still, I've seen the four processes jump to a total memory usage of around 256MB. I could probably clamp down a bit more on the timeouts to ensure that it stays below 224MB (remember, that's divided among four workers). A 224MB service that is not horizontally scaled would cost me $30.24/month. However, since I don't care about redundancy, and I don't need a custom domain name, and I don't really care about performance guarantees for this particular app, I just run it as a free sandbox application. If someday someone decides they want to support converting it to a live (paying) app, I could do that quickly, and I'm sure Dotcloud would be happy to accept payment. ;) By way of comparison, when I first start the application and throw a bunch of simple regular expressions at it, the memory footprint stays between 64MB and 96MB as a total for the four workers. In other words, if I had stricter enforcement of complexity limits, it could live happily on under $13/month.

      In your case if you're able to manage memory carefully you might fit into a 64mb instance and don't require horizontal scaling, you could keep your costs to under $9/month. That seems like it could reasonably be offset through advertising.


      Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://982090]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2014-09-21 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (175 votes), past polls