Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Modern Perl: The Book: The Draft

by hbm (Hermit)
on Sep 21, 2010 at 18:44 UTC ( #861131=note: print w/replies, xml ) Need Help??

in reply to Modern Perl: The Book: The Draft

Comments on a portion of Chapter 6. I apologize for any formatting glitches.

No need for acknowledgement. When do you hope to go to print - or, until when are you seeking feedback?

1. Which/That

Paraphrasing a styleguide I have on hand:

"That" introduces a phrase that is essential to the meaning of the word it modifies, and the phrase is not set off with commas. "Which" introduces a non-essential phrase, and is set off with commas.

You seem to use "which" exclusively. For example, in the following paragraph, both instances should be "that".

Perl's powerful ability to manipulate text comes in part from its inclusion of a computing concept known as regular expressions. A regular expression (often shortened to regex or regexp) is a pattern which describes characteristics of a string of text. A regular expression engine interprets a pattern and applies it to strings of text to identify those which match.

2. Cross-Referencing

I would modify your cross-referencing convention slightly, by including "see" or "see also" in parens and linking to the corresponding heading (e.g. <h3>) wherever possible. Consider this text:

You may use a string in other contexts, such as boolean or numeric; its contents will determine the resulting value (Coercion).

Which links to this text:

Unlike other languages, where a variable can hold only a particular type of value…

I would prefer "(see Coercion)", linked to the <h3>Coercion heading. (Or perhaps this markup is only for print, and a moot point.)

3. Fixity

I suggest reorganizing this paragraph:

The fixity of an operator is its position relative to its operands. The mathematic operators tend to be infix operators, where they appear between their operands. Other operators are prefix, where they appear before their operands; these tend to be unary operators, such as the prefix increment operator ++$x or the mathematical and boolean negation operators (-$x and !$x, respectively). Postfix operators appear after their operands (such as postfix increment $x++). Circumfix operators surround their operands, such as the anonymous hash and anonymous array creation operators or quoting operators ({ ... } and [ ... ] or qq{ ... }, for example). Postcircumfix operators surround some operands but follow others, as in the case of array or hash indices ($hash{ ... } and $array[ ... ], for example).

To this:

Fixity is an operator’s position relative to its operands:
  • Infix operators appear between their operands. Most mathematical operators are infix, such as multiplication ($x * $y).
  • Prefix operators appear before their operands and postfix appear after. These tend to be unary, such as mathematical negation (-$x); boolean negation (!$x); and postfix increment ($x++).
  • Circumfix operators surround their operands. Examples include anonymous hash creation ({ ... }) and quoting operators (qq{ ... }).
  • Postcircumfix operators surround some operands and follow others, as in the case of array or hash indices ($hash{ ... } and $array[ ... ]).

4. First-Class?

The following sentence leaves me wondering what is a "first-class entity"?

Regexes are first-class entities in modern Perl when created with the qr// operator.

5. Greediness

I’d like to reorganize your Greediness section, by moving non-greedy quantifiers up; and creating a new heading for Regex Anchors.


The + and * quantifiers by themselves are greedy quantifiers; they match as many times as long a string as possible. This is particularly pernicious when using the tempting-but-troublesome "match matching “any amount of anything" with .*:

# a poor regex my $hot_meal = qr/hot.*meal/; say 'Found a hot meal!' if 'I have a hot meal' =~ $hot_meal; say 'Found a hot meal!' if 'I did some one-shot, piecemeal work!' =~ $hot_meal;

The problem is more obvious when you expect to match a short portion of a string. Greediness Greedy quantifiers always try to match as much of the input string as possible first, backing off only when it's obvious that the match will not succeed. Thus you may not be able to fit all of the results into the four boxes in 7 Down if you go looking for "loam" with: [But] you can turn a greedy quantifier into a non-greedy quantifier by appending the ? quantifier:

my $minimal_greedy_match = qr/hot.*?meal/;

In this case, the regular expression engine will prefer the shortest possible potential match, increasing the number of characters identified by the .*? token combination only if the current number fails to match. Because * matches zero or more times characters, the minimal potential match for this token combination is zero characters:

say 'Found a hot meal' if 'ilikeahotmeal' =~ /$minimal_greedy_matc +h/;

If this isn't what you want, use the + quantifier to match one or more items:

my $minimal_greedy_at_least_one = qr/hot.+?meal/; unlike( 'ilikeahotmeal', $minimal_greedy_at_least_one ); like( 'i like a hot meal', $minimal_greedy_at_least_one );

The ? quantifier modifier also applies to the ? (zero or one matches) quantifier as well as the range quantifiers. In every case, it causes the regex to match as few times characters as possible. In general, the greedy modifiers .+ and .* are tempting but dangerous tools. For simple programs which need little maintenance, they may be quick and easy to write, but non-greedy matching seems to match human expectations better. If you find yourself writing a lot of regular expression with greedy matches, test them thoroughly with a comprehensive and automated test suite with representative data to lessen the possibility of unpleasant surprises.

Regex Anchors

Regex anchors force a match at a specific position in a string. \A ensures that any match will start at the beginning of the string; and \Z ensures that any match will finish at the end of the string. For example, to find a four-letter word that starts with “l” and ends with “m”, you can use this expression:

my $seven_down = qr/\Al${letters_only}{2}m\Z/;

[ Actually, you might pick an example letter more prominent than "l"...]

If you're not fortunate enough to have a Unix word dictionary file available [what if I am fortunate enough?], the word boundary metacharacter (\b) matches only at the boundary between a word character (\w) and a non-word character (\W):     my $seven_down = qr/\bl${letters_only}{2}m\b/;

Like Perl, there's more than one way to write a regular expression. Consider choosing the most expressive and maintainable one.



Replies are listed 'Best First'.
Re^2: Modern Perl: The Book: The Draft
by chromatic (Archbishop) on Sep 24, 2010 at 22:38 UTC

    Thanks for your suggestions; I've taken most of them. (That versus which is a hot button most style guides can't explain without resorting to "But that's how we've always heard it.")

    I hope to have a camera-ready PDF for the printer by the end of the month.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://861131]
[holli]: but is this really a problem 1nickt? Nobody knows how your human taxi drivers brain works either
[holli]: at least for cars, if the ai works better than a human on average, by all means let the machines drive

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2017-11-21 13:12 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (301 votes). Check out past polls.