comment on

I'm not sure why you think it would be any easier to maintain that the non-regex solutions?

hdb's solution is extremely clever and obviously vastly superior in performance. Most impressive.

Nevertheless, the reasons I'd prefer to have to maintain the regex solution include:

The algorithm find_substring() uses is extremely clever, perhaps even a little subtle. I always assume that's likely to be a disadvantage for future maintainability.
And, indeed, after ten minutes of close inspection, I'm still not entirely sure I fully understand find_substring(). By Kernighan's Metric, if I'm not smart enough to understand it, I'm certainly not half smart enough to maintain it.
Infinite loops and manually iterated string indexes always make me nervous. They are opportunities for off-by-one errors and overlooked edge-case behaviours to lurk...or to creep in when the code is subsequently updated.
I genuinely prefer functional or declarative styles of programming. The regex solution describes exactly what it does (provided you're fluent in the regex dialect...which I am), whereas find_substring()'s implementation not at all self-explanatory (to me).
Continuing on with the functional/imperative contrast: the regex solution uses exactly one automatically-preset read-only variable. In contrast find_substring() uses a couple of manually-assigned read-write variables. In my view that means the latter has several extra places where future well-intentioned modifications could quietly break something.
I think that the regex solution would also be much easier to integrate into a larger parsing system, when the current simple text processing task subsequently grows more complicated (which it inevitably will).
I can debug the behaviour of the regex-based solution visually using Regexp::Debugger. I'd have to use perl -d to debug find_substring(). <shudder>
The find_substring() implementation somehow reminds me of my years of coding in C, and at this point I really don't need that kind of post-traumatic flashback undoing all the therapy. ;-)

Damian

In reply to Re^4: Finding repeat sequences. by DamianConway
in thread Finding repeat sequences. by BrowserUk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Just another Perl shrine
	PerlMonks