Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^7: Finding repeat sequences.

by BrowserUk (Pope)
on Jun 21, 2013 at 00:20 UTC ( #1040045=note: print w/ replies, xml ) Need Help??


in reply to Re^6: Finding repeat sequences.
in thread Finding repeat sequences.

Why would the code be updated. -- I don't know ...

Why would anyone modify it, when it performs the required task as is. -- Again, I don't know.

So you try to predict the future; ie, guess. I don't.

If at some point in the future the code needs modification; I'll adapt it. Then.

If at some point after that, the code requires modification again, or I encounter another task that I realise it can be adapted to, Then I'll consider trying to generalise it. But right now, it has one, and only one, very specific purpose.

And I'll willingly and knowingly trade the near 3 orders of magnitude performance gain for that task now, against any potential savings against potential future maintenance costs.

I'm very firmly of the opinion -- based on my years of experience -- that premature generalisation has cost this industry far more, in both financial and in terms of its reputation for spending a fortune developing huge, all encompassing, singing & dancing solutions that never work, and quietly or otherwise, just end up in the bit bucket; than premature optimisation ever has or ever will.

And look...you're now modifying the code! That itself is a potential source of bugs. You may never have accidentally left a "debugging print" in code, but I certainly have. I've even shipped code with them left in.

Hand on heart, no, I never have.

But then, I don't use test harnesses that steal my output and summarises it to a bunch of meaningless statistics.

Equally, nor do I do my explorations on my 'live' code. (Ie. The function in the actual application is very unlikely to be an anonymous subroutine value in a hash, to a key called hdb. Nor is it likely to be called find_substring().

In fact, it is quite likely to not look much like hdb's implementation at all. Now I've found and understood the algorithm, I'll almost certainly re-write it to better fit with the nature of application.

Eg. I probably pass in a reference to the bitstring, convert it to the bytestring internally, and return a packed tuple that encapsulates the compressed bitvector as (say):

return pack 'L L Q*', $reps, $bits, substr( $$bitvector, 0, int( ( $_ + + 63) / 64 ) * 8 );

This thread is all about algorithm, not implementation. (Which still leaves me wondering if hdb's algorithm couldn't be encapsulated into a regex?)

Sure, sometimes you need subtlety. And sometimes you have to write "manual" code.... But I will continue to believe that such code should be the exception, not the paradigm.

I completely agree; but were this paradigmatic problem, I probably wouldn't have needed to ask for help.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^7: Finding repeat sequences.
Download Code
Re^8: Finding repeat sequences.
by DamianConway (Beadle) on Jun 21, 2013 at 02:03 UTC
    So you try to predict the future; ie, guess.

    No. I simply try to develop coding habits that make the inherently unpredictable nature of the future less error-prone and less fraught. Basing those habits on repeated patterns of behaviour and outcome I have identified from observing the past.

    I'm very firmly of the opinion -- based on my years of experience -- that premature generalisation has cost this industry far more, in both financial and in terms of its reputation for spending a fortune developing huge, all encompassing, singing & dancing solutions that never work, and quietly or otherwise, just end up in the bit bucket; than premature optimisation ever has or ever will.

    I'm sure you're right. But you seem to be berating me for something I neither suggested nor advocate.

    It didn't prefer the regex solution because it was already more generalized; I preferred it because it was more descriptive (to me), less error-prone (for me), and easier to rework or enhance (by me) should that eventually become necessary.

    And I definitely wasn't suggesting premature optimization. After all, I'm not the one who would:

    ...willingly and knowingly trade the near 3 orders of magnitude performance gain for that task now, against any potential savings against potential future maintenance costs.

    ;-)

    I was merely saying that I believe that code maintainability is generally more important than code performance. Which is why I still prefer the regex-based solution, even through it's three orders of magnitude slower.

    I doubt we're ever going to agree on this...which is fine. But I'm certainly not going to apologize for making maintainability my own higher priority, nor for advocating it as a priority for most developers.

    Damian
      No. I simply try to develop coding habits that make the inherently unpredictable nature of the future less error-prone and less fraught.

      Translation: You expend extra energy now, to potentially save energy in the future. That is guessing!

      And if you guess wrong; you didn't just waste that extra energy; you potentially cost more energy undoing the product of that extra energy in order to accommodate the real future requirement.

      No matter how you dress that equation in "experience", there is no way to make doing something now that you didn't need to do; in order to potentially save some immeasurable amount of effort that you might need to expend in the future; balance. Never has, and never will.

      I'm sure you're right. But you seem to be berating me for something I neither suggested nor advocate.

      Certainly not "berating". A frank exchange of views for the purpose of perhaps modifying our positions. (NB: Our own, not each others.)

      I assume it is of some interest to you, as you are still taking part.

      I was merely saying that I believe that code maintainability is generally more important than code performance. Which is why I still prefer the regex-based solution, even through it's three orders of magnitude slower.

      This makes no sense to me.

      It implies (not states) that you would condemn (strong word for effect) the users of the application to waiting 4 1/2 weeks instead of 1 hour; 15 weeks instead of 1 day; for the sake that this piece of code might need to be modified at some unspecified time in the future.

      As a purely academic nicety, stating that you favour maintenance over performance is always a vote winner; but in the real world, code that does what it needs to do in a timely fashion is of far more importance than whether it was written in a declarative or functional style; or even if it might require the programmer to expend some effort to (re-)understand it in 6 months or 2 years from now.

      Indeed, most users would say: "That's his damn job"!

      Beyond old-farts like me playing 1stP shooters; I've never heard a user say he wishes his application ran more slowly.

      It is the height of something for programmers to favour (potentially) saving a little of their time (for which they are well paid), in a future that may never arrive, over the time of the users who are generally paying (directly or otherwise) for the privilege of using the application.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        You expend extra energy now, to potentially save energy in the future.

        Not really. It was vastly easier for me to code—and then improve—the regex-based approach than it was to even understand the index()-based one. In fact, in this instance, both solutions I submitted had worked perfectly first time. It doesn't get much lower-energy than that.

        And, for me, that's the entire point of developing good coding practices and habits: once you have them, they require far less work initially, and (if you choose them carefully) they also lead to far less work in the long term.

        No matter how you dress that equation in "experience", there is no way to make doing something now that you didn't need to do; in order to potentially save some immeasurable amount of effort that you might need to expend in the future; balance.

        I'm sure you wouldn't write that unless you truly believed it. But, if you truly believe it, then there really isn't anything more to discuss. I cannot recall reading a statement about code development and maintenance that accords less well with virtually every experience I have had myself. Nor one with which I would more fundamentally disagree.

        Doing something now that you didn't need to do; in order to potentially save an immeasurable amount of effort in the future...that, in my view, is the very essence of good software engineering.

        Damian

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1040045]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-10-02 11:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (55 votes), past polls