I'm not very experienced with either of these modules, but, as ikegami points out, some of your code seems strange—for example, if you're going to put the result of decode_entries in another scalar anyway, why use the hard-to-read decode_entities(my $new = $old) rather than the more natural my $new = decode_entities $old? Have you looked at $decodedParsedContentWithDecodeEntities? I'd take a look at that, because, well, you have unexpected behaviour, and it's good to know what's happening every step of the way.
Also note that the Text::Sentence documentation says:
The split sentences function takes a scalar containing ascii text as an argument and returns an array of sentences that the text has been split into.
—that is, it mentions that it's expecting ASCII text, which you're explicitly not giving it.
I'm also puzzled how you can get the gor : BLAH line at all. It seems that you're printing lines of the form word : words (why?), with the left-hand side a word in the right-hand side, but gor doesn't appear in the right-hand side of the output that you displayed.
UPDATE: For that matter, have you looked at $decodedParsedContent itself? A quick look at the non-XS part of the source for HTML::Entities reveals that it's just substituting decimal, then hexadecimal, then named entities. One can imagine a strange scenario where, say, the expansion of a hexadecimal entity creates a decimal entity; it's possible, though (I imagine) unlikely, that you're seeing that here.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||