Re: RFC - Regular Expressions Tutorial, the Basics (for BEGINNERS)by ww (Archbishop)
|on Jan 22, 2007 at 19:11 UTC||Need Help??|
brusimm, this seems a good start; even meritorious!
...sufficiently so that I hope I will not offend with observations on a few things that seem to me to be shortcomings. So, if you care,
You seem to be a bit categorical when, even in the context framed by your RFC, you might do better to qualify remarks. For example,
"Patterns are used to locate text strings within text lines."... to which I say, 'yep, but also across line-endings and not necessarily only for what the target audience might consider a "text string"
I would urge great care in language in comments, as well. For example, in "example 1"
004: print "There, brus showed up.\n" ; # if the pattern is found, print it!You mileage may vary, but I read the comment as at least mildly MISleading because of what I see as potential confusion on the reader's part between the hardcoded output "brus showed up" and the captured "brus"
BTW, here or, at latest, the next section might be the time, despite the added complexity entailed, to introduce the wisdom of testing whether a match actually DID succeed.
where the truncated comment might be used to foreshadow a later segment of your tut that demonstrates the dangers of using a presumed but untested match.
I hold (again, YMMV) that economy of language helps the reader to understand.
Let’s say we knew whoever wrote the line, had issues with typing, for example, I had too much coffee and wrote the following:might better be phrased (trying not to impose too much of a
Let's say I'd had too much coffee, and might have mis-typed the test string.
I'd also urge greater care (or, perhaps, selectivity) in language such as:
Notice the control character of “s” before the forward slash. Check it out with this script:
While your reference to the "s" before the slash as a "control character" is arguably CORRECT, as a tutor you may want to consider your audience's preconceptions/prior knowledge/frame of reference. Many who are new to regexen may have sufficient computer experience to simply slide over a semi-familiar phrase, confident in a preconception that a "control character" is a byte in the range 0-0x20; something quite different than what you intend.
I could go on a bit more, but perhaps this is a better time to simply offer the thoughts that: