comment on

I know most of you already know this, but I have seen a lot of misuse of regexes, so I thought I'd write this for begginers to read.

Many people get 'regex happy,' and use them when one could use a much faster function. Here's an example:

m/.{$width}/
[download]

This is some advice a notable perl monk gave someone asking about how to implement fixed-width columns for data files. The goal was to extract $width amount of characters from a variable. More experienced programmers are shaking their heads right now, they know it would be much more efficient to use this:

substr($someVar,$offSet,$width)
[download]

You see, regexes are a very powerful tool, but they are not fast (well, relatively speaking). It is much faster to say, "take this many bytes from this variable, starting at this position in the string," than it is to say, "take the input and see if "." matches the next character, and than repeat that for this number of times." Also, with the regex, it has to be compiled, and then used, all of which is done in polynomial time (read: not fast).

Another common thing is to try to match the whole string, when you only need to match part of it. Here are some examples:

$text =~ s/(.*)($string_1)(.*)($string_2)(.*)/$1$4$3$2$5/ 
#taken from recent post
if ($text =~ m/^.*(\.txt)$/) #used to see if $text ends in .txt
[download]

The first example (actually taken from this site) can be improved by taking the first and last (.*) out; we don't need to match the beginning and end, that's not what we are switching around (the regex is used to swap string_1 and string_2). The second regex (just an example I made up now) is being implemented to see if a file ends in .txt. It is extremely wasteful, we only need to match the .txt part, not the whole string. The improved regexes are below:

$text = s/($string_1)(.*)($string_2)/$3$2$1/
if ($text =~ m/\.txt$/)
[download]

There are probablly other common mistakes, but I'm just writing this on things in my head now, that I have recently seen, so if some one else wants to post a common mistake in a reply to this, to try to help the beginners, I would very much appreciate that.

The 15 year old, freshman programmer,
Stephen Rawls

In reply to Regex Misuse by srawls

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl: the Markov chain saw
	PerlMonks