Re: Perl variant of linux tool strings

by duct_tape (Hermit)
on Mar 23, 2005 at 20:46 UTC

in reply to Perl variant of linux tool strings

Not a module, but there are some versions of the 'strings' tool done for the Perl Power Tools project.

Re^2: Perl variant of linux tool strings
on Mar 23, 2005 at 21:04 UTC
    I like to collect words from a pdf or word document! So far Perl Power tools does a very good job! Thanks

      For collecting words from pdf documents, you can use the ps2ascii utility which comes with ghostscript. It executes the document with ghostscript, using a special device that outputs only ascii text. As ghostscript can handle pdfs too, ps2ascii works fine on them (although I did have some compatibility problems with some pdfs, depending on the generating program and the version of ghostscript).

      This doesn't work for word documents of course.

        OP, you may have some luck loading MS Word into (star|open)office, printing to pdf then chucking it at ps2ascii. As it is the exact same formating that is hardest for *office to get correct and ascii has little remmenant of these I guess you could have a lot of luck.


        As ambrus points out below of course if you can read the word doc into *office then you can just export ASCII from there. Sorry, it has been a rather long day

        You may also want to trawl through a list of filters, I found this one which looks like it may have some tools that could help


        Pereant, qui ante nos nostra dixerunt!

