Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Analysis of Regular Expressions

by jffry (Hermit)
on Mar 17, 2010 at 20:24 UTC ( [id://829265]=note: print w/replies, xml ) Need Help??


in reply to Analysis of Regular Expressions

Assumptions:

  • You have access to a large subset of the files that these regular expressions will be used against.
  • OR
  • If these regular expressions are simply being ranked in the abstract, you have some means of constantly accessing a variety of real-life sample files.

Write a daemon that continuously runs each regular expression against an ever growing list of files. The daemon updates a table, and each row contains 2 columns: the regular expression and the average line count.

Your output program simply sorts the table on the average line count column. So it is quick in that regard. However, as the daemon runs each regular expression against more and more files, the ranking may change.

Obviously, newly added regular expressions will have a more volatile rank compared to older ones that have been run against thousands of files. To combat this problem, you could determine a minimum file comparison quantity before the regular expression shows up in the table. For speed, you could have the daemon give priority to newly added regular expressions until their rank stabilizes. In fact, these two points should be configurable as tuning parameter of the daemon.

What I like about this approach is that it throws all the theoretical junk out the door. Brute force can be ugly, but then again, the map is not the territory, and brute force reveals the territory.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://829265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-20 14:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found