Comments on a portion of Chapter 6. I apologize for any formatting glitches.
No need for acknowledgement. When do you hope to go to print - or, until when are you seeking feedback?
1. Which/That
Paraphrasing a styleguide I have on hand:
"That" introduces a phrase that is essential to the meaning of the word it modifies, and the phrase is not set off with commas. "Which" introduces a non-essential phrase, and is set off with commas.
You seem to use "which" exclusively. For example, in the following paragraph, both instances should be "that".
Perl's powerful ability to manipulate text comes in part from its inclusion of a computing concept known as regular expressions. A regular expression (often shortened to regex or regexp) is a pattern which describes characteristics of a string of text. A regular expression engine interprets a pattern and applies it to strings of text to identify those which match.
2. Cross-Referencing
I would modify your cross-referencing convention slightly, by including "see" or "see also" in parens and linking to the corresponding heading (e.g. <h3>) wherever possible. Consider this text:
You may use a string in other contexts, such as boolean or numeric; its contents will determine the resulting value (Coercion).
Which links to this text:
Unlike other languages, where a variable can hold only a particular type of value…
I would prefer "(see Coercion)", linked to the <h3>Coercion heading. (Or perhaps this markup is only for print, and a moot point.)
3. Fixity
I suggest reorganizing this paragraph:
The fixity of an operator is its position relative to its operands. The mathematic operators tend to be infix operators, where they appear between their operands. Other operators are prefix, where they appear before their operands; these tend to be unary operators, such as the prefix increment operator ++$x or the mathematical and boolean negation operators (-$x and !$x, respectively). Postfix operators appear after their operands (such as postfix increment $x++). Circumfix operators surround their operands, such as the anonymous hash and anonymous array creation operators or quoting operators ({ ... } and [ ... ] or qq{ ... }, for example). Postcircumfix operators surround some operands but follow others, as in the case of array or hash indices ($hash{ ... } and $array[ ... ], for example).
To this:
Fixity is an operator’s position relative to its operands:
- Infix operators appear between their operands. Most mathematical operators are infix, such as multiplication ($x * $y).
- Prefix operators appear before their operands and postfix appear after. These tend to be unary, such as mathematical negation (-$x); boolean negation (!$x); and postfix increment ($x++).
- Circumfix operators surround their operands. Examples include anonymous hash creation ({ ... }) and quoting operators (qq{ ... }).
- Postcircumfix operators surround some operands and follow others, as in the case of array or hash indices ($hash{ ... } and $array[ ... ]).
4. First-Class?
The following sentence leaves me wondering what is a "first-class entity"?
Regexes are first-class entities in modern Perl when created with the qr// operator.
5. Greediness
I’d like to reorganize your Greediness section, by moving non-greedy quantifiers up; and creating a new heading for Regex Anchors.
Greediness
The + and * quantifiers by themselves are greedy quantifiers; they match as many times as long a string as possible. This is particularly pernicious when using the tempting-but-troublesome "match matching “any amount of anything" with .*:
# a poor regex
my $hot_meal = qr/hot.*meal/;
say 'Found a hot meal!' if 'I have a hot meal' =~ $hot_meal;
say 'Found a hot meal!'
if 'I did some one-shot, piecemeal work!' =~ $hot_meal;
The problem is more obvious when you expect to match a short portion of a string. Greediness Greedy quantifiers always try to match as much of the input string as possible first, backing off only when it's obvious that the match will not succeed. Thus you may not be able to fit all of the results into the four boxes in 7 Down if you go looking for "loam" with: [But] you can turn a greedy quantifier into a non-greedy quantifier by appending the ? quantifier:
my $minimal_greedy_match = qr/hot.*?meal/;
In this case, the regular expression engine will prefer the shortest possible potential match, increasing the number of characters identified by the .*? token combination only if the current number fails to match. Because * matches zero or more times characters, the minimal potential match for this token combination is zero characters:
say 'Found a hot meal' if 'ilikeahotmeal' =~ /$minimal_greedy_matc
+h/;
If this isn't what you want, use the + quantifier to match one or more items:
my $minimal_greedy_at_least_one = qr/hot.+?meal/;
unlike( 'ilikeahotmeal', $minimal_greedy_at_least_one );
like( 'i like a hot meal', $minimal_greedy_at_least_one );
The ? quantifier modifier also applies to the ? (zero or one matches) quantifier as well as the range quantifiers. In every case, it causes the regex to match as few times characters as possible.
In general, the greedy modifiers .+ and .* are tempting but dangerous tools. For simple programs which need little maintenance, they may be quick and easy to write, but non-greedy matching seems to match human expectations better. If you find yourself writing a lot of regular expression with greedy matches, test them thoroughly with a comprehensive and automated test suite with representative data to lessen the possibility of unpleasant surprises.
Regex Anchors
Regex anchors force a match at a specific position in a string. \A ensures that any match will start at the beginning of the string; and \Z ensures that any match will finish at the end of the string. For example, to find a four-letter word that starts with “l” and ends with “m”, you can use this expression:
my $seven_down = qr/\Al${letters_only}{2}m\Z/;
[ Actually, you might pick an example letter more prominent than "l"...]
If you're not fortunate enough to have a Unix word dictionary file available [what if I am fortunate enough?], the word boundary metacharacter (\b) matches only at the boundary between a word character (\w) and a non-word character (\W):
my $seven_down = qr/\bl${letters_only}{2}m\b/;
Like Perl, there's more than one way to write a regular expression. Consider choosing the most expressive and maintainable one.
Metacharacters
...
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|