http://www.perlmonks.org?node_id=886938


in reply to Re: Convert PDF file into HTML file
in thread Convert PDF file into HTML file

Oh, yeah, part of the fun of working with text from PDF is that, in order to nicely position the text on the page as for kerning (putting letters closer together to fill visual gaps between them) or justification (making spaces wider so the right side lines up to the margin), the PDF writer software may have cut up the text in small substrings and placed each on the page individually.

It's up to you to puzzle the pieces back together again.

Very rarely the text in PDF comes as one chunk.