Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Re: Re: Parsing Text into Arrays..

by BrowserUk (Patriarch)
on Jan 22, 2003 at 19:12 UTC ( [id://229112]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Parsing Text into Arrays..
in thread Parsing Text into Arrays..

Seems you are antisipating a response from me, so here it is--sans venom (Good ol' Optrex :^).

I dont believe that you can consider a parser correct if it accepts something other than the specification it was intended to parse.

I agree.

However (there had to be an however :), given the specification was 2 simple examples of sample input, it's difficult to derive a complete specification or a full set of testcases. My original version already exceeded the specification in as much as it attempted to handle the case of delimiters embedded in quotes.

Whilst you can sit down and derive a set of possible testcases that it should handle, in the absence of an accurate, full specification, all you can do is guess. I also think that given the nature of PM, it falls on the OP to test and modify where necessary any solutions provided. If I were writing for production purposes, and being paid to do it, I would do considerably more testing, but then I would also demand more information up front regarding the nature of the task.

That in no way contradicts your completeness argument, but it does add a little context to the equation.

In a much later post, castaway posted a real sample of the (predefined to him/her) protocol:

({"startup-req-3",5,"PerlMud-i3",0,"*gcs",0,0,0,0,0,0,0,"PerlLib v1","PerlLib v1","Perl v5.6.1","Perl","restricted access","blah@blah.de",(["tell":1,"ucache":1,]),0,})

which appears to indicate that no spaces are included (except within double quotes), which contrasts with the earlier simple examples. If the real specification states that there will be no inter-term white space, then it would simplify the problem and probably speed up the processing.

A last point about all of this. You are reciving packets over a network connection. So I'm guessing that just the network part takes much longer than even the slowest solution. Thus the network time is going to swamp the parse time by a great deal, and to me would suggest theres no point in optimizing this.

True, speed isn't everything (or indeed anything unless the code 'works'), but I eshew the idea that taking longer than necessary is ok if the process is IO-bound anyway.

There are very few cases these days where a computer is either single-tasking, or doing only one task.

In the case of a MUD, whilst some part of the process is reading and passing the input from the network, concurrent with this, there is almost certainly another part of the program that is processing the input received and updating state as a concequence of it; Another part that is getting, validating and actioning user (keyboard/mouse) input. If it is a graphical MUD, then maintaining the screen is likely to require every spare cycle it can get and then some.

Even if this is a single threaded/tasking MUD that cannot itself do any concurrent processing, the user is quite likely to be running a browser, compiler, chat progs etc. simulataneuosly, and extra cycles used by the MUD spends parsing network input is detracting from them and anything else running. If the parser were being used at the server end for receiving information back from multiple clients on seperate threads, the performance would become critical.

Back to the Completeness argument.

Maybe the antithisis of the article you cite is the adage (possibly by one Paul Schindler, but I haven't confirmed this, nor looked to check his credentials) of "'The perfect', is the enemy of 'the good'".

Or as I seen it written "'Good enough' is good enough" and "OP = OP" (with apologies to my dutch friends if the latter is mis-interpreted).

In this case, if the server sent me a string of ({({}) or ({foo}) or ({})}), I would consider the output routine at the server broken rather than the input parser. Of course, corruption can occur during transmission, but I would expect that to be detected and correct by the transmission protocol rather than needing to account for the possibility in the parser.

Of course, if the protocol/parser involved was responsible for running a nuclear powerstation, a heart monitor, the space shuttle (or commercial aeroplane) or the stock market, I would feel differently about this, but we are talking about a MUD.

Finally, I am not at all sure that this is an 'answer' to your post exactly, nor even contradictory to it. It's more that there's more than one point of view and that neither is necessarially wrong. Sometimes one is more applicable to a given situation than another.


Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Replies are listed 'Best First'.
Re: Re: Re: Re: Parsing Text into Arrays..
by demerphq (Chancellor) on Jan 24, 2003 at 00:02 UTC
    Ok here goes browseruk...

    given the specificationwas 2 simple examples of sample input, it's difficult to derive a complete specification or a full set of testcases

    Yah. Fair point. (Look what I did with my original attempt in the readmore :-) My view however is that given the overall context it is reasonable to assume that unbalanced delimiters are illegal. Also that delimiters embedded in quotes should be handled ok. Now I agree that this is an assumption. (as in Ass-u-me) But likewise I think you would agree that its hardly an extreme one.

    If I were writing for production purposes, and being paid to do it, I would do considerably more testing, but then I would also demand more information up front regarding the nature of the task.

    That in no way contradicts your completeness argument, but it does add a little context to the equation.

    Again a fair point and a reasonable position. I still feel that in this case the need to handle unbalanced delimiters is not that unreasonable. Also I think that from this position I would grant handling barewords as strings, but I think that handling ({foo}) as [ 'o' ] can be fairly construed as a bug. Either it should have been accepted as  [ 'foo' ] or rejected as an invalid sentence. But I suppose the specs could have indicated that that was the correct parse but.... ;-)

    which appears to indicate that no spaces are included (except within double quotes), which contrasts with the earlier simple examples. If the real specification states that there will be no inter-term white space, then it would simplify the problem and probably speed up the processing.

    I don't really see how it would have improved the situation much, but presumably it reflects your seriously-deep-end-regex approach (which I say with more respect than it may sound, i'm in a good mood right now please forgive me ;-) and I admit I didnt look too much into. I think I would probably assume the solution needs to be white space tolerant just out of conventional practice. But then again I've never coded any python. :-) It does seem justifiable to take either approach however given the lack of specs as you pointed out earlier.

    but I eshew the idea that taking longer than necessary is ok if the process is IO-bound anyway

    There are very few cases these days where a computer is either single-tasking, or doing only one task.

    Sure theres something to what you say, but I think there comes a point that when you start talking about optimization theres a line where the question goes from "how do I make my perl code faster" to "which other native compiled language should I convert my script to". Its always a hard call to make though and I agree that while confined to perl (or any other language) a good programmer keeps in mind differences in speed when selecting his way to do things. Of course correctness does have to come first, but after that everything is a balancing act.

    In this case, if the server sent me a string of ({({})or ({foo})or ({})}), I would consider the output routine at the server broken rather than the input parser. Of course, corruption can occur during transmission, but I would expect that to be detected and correct by the transmission protocol rather than needing to account for the possibility in the parser

    Sorry but this part I just dont agree with. Of course if something sends you ({})}) then its their error, but you wouldnt have even known that there was an error. You might after a certain period become concerned that things weren't right, perhaps some serious fault would have occured later on that caused you to become suspicious, but without a lot of guesswork and research and quite possibly some added code to catch the error you really couldn't have said what it was. And for me thats the last place I want to be. Admittedly these things happen, and not all of my code is bullet proof, but I think this type of thing counts as basic handling. Like putting .. or die ... on little throw away scripts where you know the file is there your honor. You do it cause in the long run it saves you a lot of hassle.

    The other thing for me about this is that these seem to me to be basic test cases. If you fail these then there are probably, not necessarily perhaps but probably, valid sentences that you will also fail. At least I would never feel comfortable that there were none. I suppose its true that lacking a formal proof of your code you can rarely be certain that your code is completely error free, but these cases seem so basic that I can't agree that failing them is acceptable.

    but we are talking about a MUD

    Um, ok. I think we see this in pretty different ways. For the record, I had no idea what a MUD was and frankly didn't care, to me this was a bunch of interesting questions rolled up into one, that for me have practical uses. For instance parsing a config file or a complicated command line. Or a safe undumper. So having that kind of error catching is essential. From your point of view it was tiny cog in a big machine which to certain extent isnt that important. Controlling its input probably is the least of your worries so move on to other things. I guess thats fair. I certainly have written code that does minimal checking in the expectation that most error cases are simply impossible. (Usually because they have been dealt with elsewhere but not always)

    I think that theres another issue here. We're posting on a site where people come to learn stuff. Scary thought as it may be but theres a good chance theres somebody out there that may have learned something from one or more of our posts. I certainly come here and learn from people, all the time actually, its why I come back. So I think that when we post we need to put a little extra effort in to being correct and also complete, accepting it when someone calls us on it, and lastly checking the quality of our colleagues contribution. Ive been embrassed by being woefully wrong in the past, but in the end I appreciated that someone told me of it. Even if what they said gave me nothing more than a bad conscience and the thought that I was wrong, I learned from it at least by figuring out if they were right or not.

    Going back to these people reading our posts, I think the point is that one of them might think the solution is "complete and correct". I suppose that it can be argued that if they dont check into it independently then its their own fault, but I personally hope that some of the stuff Ive learned here doesnt need to be researched further.

    Anyway, the above is more an explanation of my approach than a comment on this situation. Even if we see this differently its quite possibly been (or will be) useful to somebody to see the different views and the comments associated. I certainly will remember your multi-tasking point the next time the subject of optimization comes up.

    Cheers BrowserUk

    --- demerphq
    my friends call me, usually because I'm late....

      demerphq++ (and more if I could).

      If you take a look at Re: Re: Parsing Text into Arrays.., you'll see that not only did I have to go back and clean up my code, I had to go back again and correct a major flaw that I later found.

      The truth is, that were I currently in a position of needing to employ a coder, I would probably rather employ you than myself.

      My efforts here are nothing more than a learning exercise (for me) and a hobby. I hope that posters benefit from my efforts, but my primary goal is the enjoyment of the doing and learning for my own benefit. I'm much to enamored with the fun of coding and exploring perl.

      I use PM as a ready source of varied, real-life problems for me to solve that cause me to stretch myself into areas that I might otherwise never go. That, and the occasional good debate on perl-related matters (I wish there were more!) are my motivation for coming here.

      There are many points you raise that I would love to discuss further, but that would probably be seen as grand standing, so suffice it to say; Thankyou for taking the time to respond so thoughtfully and thoroughly.

      I guess I should adopt a theorbtwo style Caveat implementor clause as a signature.


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://229112]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-19 03:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found