I’m trying to extract International Phonetic Alphabet (IPA) symbols from html source code. Internet Explorer’s View > Source console displays the symbols as symbols (ie, not in any utf form); moreover, from this console I can copy these symbols and paste them into Notepad as unformatted Arial-font text without utf code taking their place.
However, when I ask Perl to extract such symbols from html source and write them to Excel via Spreadsheet::WriteExcel, I get junk. (Spreadsheet::WriteExcel’s default ‘write’ font is Arial. Having opened the resulting Excel file, it makes no difference what font – including IPA-specific fonts – I choose to display a given cell's contents: it’s still junk.)
Can you explain to me what’s going on? Is there a fix? I’m not familiar with utf-8 programming in Perl, though I suspect I’ll need to go there.
The website I’m trying to use is www.dictionary.com. Search on a word like ‘hello’ and click Show IPA. The returned stuff between the slashes – that’s the stuff I wish to extract and have Perl write to a spreadsheet.