how about this approach:
# ASSUME: words are sorted lexically ascending.
use warnings;
use strict;
my $last_anchor = qr{ \z \A }xms; # init to never-matching pattern
my $first = qr{ [A-Za-z] }xms;
my $anchor = qr{ $first [AEIOUaeiou] }xms;
my $letters = qr{ [_A-Za-z] }xms;
my $sequence_number_field = qr{ \d+ : [ ]{4} }xms;
WORD:
while (<DATA>) {
next WORD
unless m{ $sequence_number_field
(?! $last_anchor) # NO match to last anchor
($anchor $letters*) # capture anchoring word to $1
}xms;
my $word = $1;
$last_anchor = qr{ @{[ substr $word, 0, 2 ]} }xms;
print "$word: $_";
}
__DATA__
1: xxAardvark <-- first if really Aardvark
2: Abacus
3: Actuary
4: Additive
5: Aeolian <-- first if no Aardvark
1930: Nails <-- #Na here
1931: Naked
1932: Name
1933: Napkin
1934: Narcotics
1935: Narrow
1936: Nature
1937: Nausea
1938: Navel
1939: Navy
1940: Nazi
1941: Nearsighted <-- #Ne here
1942: Neck
1943: Necklace
1944: Necktie
1945: Necromancer
1946: Need
1947: Needle
1948: Negligee
1949: Neighbour
1950: Neighbourhood
1951: Nephew
1952: Neptune
1953: Nerd
1954: Nervous_Breakdown
1955: Nest
1956: Net
1957: Nettles
1958: New
1962: New_Year
1959: News
1960: Newspaper
1961: Newspaper_Reporter
1963: Nickname <-- #Ni here
1964: Niece
1965: Night
1966: Nightclub
1967: Nightgown
1969: Nightingale
1968: Nightmare
1970: Ninepins
1971: Nipples
1972: Nobility <-- #No here
1973: Noise
1974: Noodles
1975: Noose
1976: North
1977: Northern_Lights
1978: Nose
1979: Notary
1980: Notebook
1981: November
1982: Nuclear_Bomb <-- #Nu here
1984: Numbers
1983: Numbness
1985: Nuns
1986: Nuptial
1987: Nurse
1988: Nursing
1989: Nuts
1990: Nymph