how about this approach:
# ASSUME: words are sorted lexically ascending.
use warnings;
use strict;
my $last_anchor = qr{ \z \A }xms; # init to never-matching pattern
my $first = qr{ [A-Za-z] }xms;
my $anchor = qr{ $first [AEIOUaeiou] }xms;
my $letters = qr{ [_A-Za-z] }xms;
my $sequence_number_field = qr{ \d+ : [ ]{4} }xms;
WORD:
while (<DATA>) {
next WORD
unless m{ $sequence_number_field
(?! $last_anchor) # NO match to last anchor
($anchor $letters*) # capture anchoring word to $1
}xms;
my $word = $1;
$last_anchor = qr{ @{[ substr $word, 0, 2 ]} }xms;
print "$word: $_";
}
__DATA__
1: xxAardvark <-- first if really Aardvark
2: Abacus
3: Actuary
4: Additive
5: Aeolian <-- first if no Aardvark
1930: Nails <-- #Na here
1931: Naked
1932: Name
1933: Napkin
1934: Narcotics
1935: Narrow
1936: Nature
1937: Nausea
1938: Navel
1939: Navy
1940: Nazi
1941: Nearsighted <-- #Ne here
1942: Neck
1943: Necklace
1944: Necktie
1945: Necromancer
1946: Need
1947: Needle
1948: Negligee
1949: Neighbour
1950: Neighbourhood
1951: Nephew
1952: Neptune
1953: Nerd
1954: Nervous_Breakdown
1955: Nest
1956: Net
1957: Nettles
1958: New
1962: New_Year
1959: News
1960: Newspaper
1961: Newspaper_Reporter
1963: Nickname <-- #Ni here
1964: Niece
1965: Night
1966: Nightclub
1967: Nightgown
1969: Nightingale
1968: Nightmare
1970: Ninepins
1971: Nipples
1972: Nobility <-- #No here
1973: Noise
1974: Noodles
1975: Noose
1976: North
1977: Northern_Lights
1978: Nose
1979: Notary
1980: Notebook
1981: November
1982: Nuclear_Bomb <-- #Nu here
1984: Numbers
1983: Numbness
1985: Nuns
1986: Nuptial
1987: Nurse
1988: Nursing
1989: Nuts
1990: Nymph
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|