Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
I know most of you already know this, but I have seen a lot of misuse of regexes, so I thought I'd write this for begginers to read.

Many people get 'regex happy,' and use them when one could use a much faster function. Here's an example:

m/.{$width}/
This is some advice a notable perl monk gave someone asking about how to implement fixed-width columns for data files. The goal was to extract $width amount of characters from a variable. More experienced programmers are shaking their heads right now, they know it would be much more efficient to use this:
substr($someVar,$offSet,$width)
You see, regexes are a very powerful tool, but they are not fast (well, relatively speaking). It is much faster to say, "take this many bytes from this variable, starting at this position in the string," than it is to say, "take the input and see if "." matches the next character, and than repeat that for this number of times." Also, with the regex, it has to be compiled, and then used, all of which is done in polynomial time (read: not fast).

Another common thing is to try to match the whole string, when you only need to match part of it. Here are some examples:

$text =~ s/(.*)($string_1)(.*)($string_2)(.*)/$1$4$3$2$5/ #taken from recent post if ($text =~ m/^.*(\.txt)$/) #used to see if $text ends in .txt
The first example (actually taken from this site) can be improved by taking the first and last (.*) out; we don't need to match the beginning and end, that's not what we are switching around (the regex is used to swap string_1 and string_2). The second regex (just an example I made up now) is being implemented to see if a file ends in .txt. It is extremely wasteful, we only need to match the .txt part, not the whole string. The improved regexes are below:
$text = s/($string_1)(.*)($string_2)/$3$2$1/ if ($text =~ m/\.txt$/)

There are probablly other common mistakes, but I'm just writing this on things in my head now, that I have recently seen, so if some one else wants to post a common mistake in a reply to this, to try to help the beginners, I would very much appreciate that.

The 15 year old, freshman programmer,
Stephen Rawls


In reply to Regex Misuse by srawls

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (14)
    As of 2014-12-19 10:48 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (78 votes), past polls