Ovid has asked for the wisdom of the Perl Monks concerning the following question:
Some of the comments in a node about a regex problem got me to thinking about the maintainability of regexes, versus alternate solutions. The regex in question, after some patching (with heartfelt thanks to Dermot and others for mega-help), looks like the following:
Note that the regex is complicated enough that I've even indented the comments to help some poor programmer behind me maintain it. As it turns out, it still has two very subtle problems (which are irrelevant to this discussion) which arise only under rare circumstances. How would you even find those problems? Heck, if I were really evil, I could put the regex on one line and make the task virtually impossible for the average programmer:$data =~ s/ ( # Capture to $1 <a\s # <a and a space charact +er (?: # Non-capturing parens [^>](?!href) # All non > not foll +owed by href )* # zero or more of th +em .? href\s* # href followed by zero +or more space characters ) ( # Capture to $2 &\#61;\s* # = plus zero or more sp +aces ( # Capture to $3 &[^;]+; # some HTML character co +de (probably " or ') )? # which might not exist (?: # Non-grouping parens .(?!\3) # any character not foll +owed by $3 )+ # one or more of them .? (?: \3 # $3 )? # (which may not exist) ) ( # Capture to $4 [^>]+ # Everything up to final + > > # Final > ) /$1 . decode_entities($2) . $4/gsexi;
When I made the original post, tilly pointed out right away that he wouldn't use a regex to solve the problem (gasp!). That got me to thinking: since I love regex, I tend to employ them a lot. They're fast (if properly written), but many programmers don't grok them. Heck, even some of my simpler regexes are complicated:$data =~ s/(<a\s(?:[^>](?!href))*.?href\s*)(&\#61;\s*(&[^;]+;)?(?:.(?! +\3))+.?(?:\3)?)([^>]+>)/$1.decode_entities($2).$4/gsei;
That one just guarantees that a user-entered number fits my format. Aack!$number =~ /((?:[\d]{1,6}\.[\d]{0,5})|(?:[\d]{0,5}\.[\d]{1,6})|(?:[\d] +{1,7}))/;
tilly's comment, however, got me to thinking: how do Perlmonks create maintainable regexes, or do they avoid them in favor of more obvious solutions? I pride myself on writing clear, maintainable code with tons of comments. My beloved regexes, however, are the fly in my ointment of clarity. How do YOU deal with this?
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just go the the link and check out our stats.
|
---|
Replies are listed 'Best First'. | |
---|---|
RE (tilly) 1: Regexes vs. Maintainability
by tilly (Archbishop) on Sep 29, 2000 at 15:05 UTC | |
RE: Regexes vs. Maintainability
by japhy (Canon) on Sep 29, 2000 at 00:16 UTC | |
by Ovid (Cardinal) on Sep 29, 2000 at 01:12 UTC | |
by japhy (Canon) on Sep 29, 2000 at 04:21 UTC | |
Re: Regexes vs. Maintainability
by 2501 (Pilgrim) on Sep 29, 2000 at 03:51 UTC | |
Re: Regexes vs. Maintainability
by hil (Sexton) on Sep 29, 2000 at 05:42 UTC | |
by mirod (Canon) on Sep 29, 2000 at 11:46 UTC | |
RE: Regexes vs. Maintainability
by Jonathan (Curate) on Sep 29, 2000 at 17:29 UTC |
Back to
Seekers of Perl Wisdom