|Perl: the Markov chain saw|
Re: Re: Re: Perl vs. Python: Looking at the Codeby danger (Priest)
|on Apr 05, 2002 at 10:59 UTC||Need Help??|
Thanks for the code followup --- I do like the list method you showed rather than my lambda hack (like I said, it's been sometime since I actually played with Python ... something around 1.5.x, it didn't even have += back then iirc). You are right, Python has certainly improved in speed: From 31 to 13 secs just switching to 2.2.1c2 vs 2.0.1, and then to 11 secs using xreadlines(). Cheating and reading in whole files into memory and then working with them brought it down to 8 secs --- but the same cheats on the Perl version took it from 7 to 3 secs. The better relative improvement in the perl cheat is because we can get a "word" count via s/\s+//g without building a list (after, of course, we get the newline and byte counts), I couldn't find a way to do that in Python without building a list --- so the len(string.split()) was the best I could do in Python). Also, I did get a python version working with fileinput, but it was vastly slower and has awkward semantics for dealing with individual files while you iterate through them (ie, rather than an 'eof' test to see if you are at the end of the current file, you get a 'isfirstline()' test to see if you just read the first line of a new file ... this makes for awkward logic in my opinion).
All said and done, although I'm not interested in relatively small differences in the number of characters --- one of your strong concluding statements in your original post was:
The points I've shown above are concrete examples of why, even with best coding practices, character for character, and due to language design issues you will save characters in Python
And I think, once you look at the context of actually writing programs, rather than syntax fragments, your statement won't really hold up. Although, perhaps now others may see that Python isn't necessarily as verbose as it is often made out to be. Your updated version looks quite nice :-)
As for your comment that the circa 3:2 speed differential is negligible, I suggest that perhaps depends more on application domain and the kind of work you usually do. Further, the speed difference can be much more significant --- using regexen appears to be much slower in Python. Example: A simple grep script (takes a pattern, reads stdin, prints lines that match); using a pattern of "a.*e.*i.*o.*u" on my /usr/dict/words file (find all words containing the ordered (not necessarily contiguous) sequence of vowels). The Python version took 7.5 secs, the Perl version took 1 sec, and the C grep on my box took 0.2 secs --- incidentally, my words file is non-standard and contains 263,533 entries, of which 47 match the pattern given. For myself, this renders your 'if the languages were equal on every other count' qualifier somewhat moot.
As for module documentation --- doc strings are nice for what they do, but rather limited. In fact, the primary documentation for Python and its libraries is a set of LaTeX files. Perl's POD isn't as flexible or as powerful as LaTeX, but it is simple and it is embeddable, which are pretty good properties, and provides a standard documentation model (and utilities) for all of Perl and its modules.
Some of the other points you raised are valid: a standard language reference capabable of supporting mulitple implementations can be a good thing versus just a reference implementation; fewer rules and fewer styles can certainly help beginners (though can also be constraining to experienced programmers); Python's instrinsic OO model is simpler and cleaner; Argument passing in Python is nicer. However, Perl6 looks poised to address most of these, though I don't expect to see any kind of release before summer 2003. Something you didn't mention is that Python ships with a pretty sizeable library --- although CPAN remains unmatched in any language.
Anyway, perhaps I'll see you at the next PM meeting and we can follow this up over a beer or three :-)