When you say "perl spits that line back out as", do you mean "prints to your terminal as"? Perhaps Perl's printing the character, but your terminal cannot display it.
If I take this page and save it as emdash.html, and run:
open(FOO,'<emdash.html') or die $!;
while(<FOO>) {
if( /Pierrefonds/ ){
print;
print join ' ',map { ord } split // ;
print "\n";
}
}
both lines containing Bernard Patry's riding name appear in my xterm as "PierrefondsDollard", however, looking at the values of each character printed below each line, I can see that the first one contains an extra unprinted character, decimal value 151. That's the em dash.
The fun part is that the em dash character of 151 isn't actually in ISO-8859-1. It's from the Windows Latin 1 character set, which isn't directly compatible with ISO-8859-1. This could explain why it doesn't display correctly in your (or at least, my) terminal. See http://www.cs.tut.fi/~jkorpela/www/windows-chars.html for more details.