These special cases are fairly easy to accommodate:
-
To prevent 4 consecutive digits from being wrongly identified as a year, specify that the digits occur at the start of the right-hand string:
if (my @m = $right =~ /^\d{2}(\d{2})(.*)/)
# ^ Add this
Within a regex, the special character ^ means “match at the start of the line.”
-
To remove spaces, use the substitution operator (with the /g modifier for global replacement):
$right =~ s/\s+//g;
-
As for 2.
-
To prevent the string from ending in a dash, use the substitution operator again:
$right =~ s/-$//;
$ is another special regex character: it means “match at the end of the line.”
See “Metacharacters” in perlre.
Note the value of using a test-driven approach: I was able to add the 4 new input/output pairs to %data, make changes to get the new test cases to pass, and know that these modifications did not invalidate the original solution (because the 4 original test cases still pass).
Hope that helps,