in reply to Re^2: Word Frequency in Particular Sentences
in thread Word Frequency in Particular Sentences
And here is some code for getting the text out of a PDF, using an excellent little CPAN module called CAM::PDF. (If you don't know how to install CPAN modules, just ask).
This goes through a PDF page-by-page, grabbing the text, and then saves it all to a text file. Note that if your PDF is huge you may want to modify this to do it in chunks (the 367 page PDF I tested it on only took a few seconds, though).
#!/usr/bin/perl + use warnings; use strict; use CAM::PDF; my $pdf_path = $ARGV[0] or die "No pdf specified"; my $pdf = CAM::PDF->new($pdf_path); my $text = ''; for my $page (1..$pdf->numPages) { $text .= $pdf->getPageText($page); } open my $file, '>', 'pdftext.txt'; print $file $text; close $file;
I'm a peripheral visionary... I can see into the future, but just way off to the side.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Word Frequency in Particular Sentences
by Anonymous Monk on Mar 28, 2008 at 16:40 UTC | |
by planetscape (Chancellor) on Mar 28, 2008 at 22:54 UTC | |
by nefigah (Monk) on Mar 28, 2008 at 18:09 UTC | |
by Anonymous Monk on Mar 28, 2008 at 19:51 UTC |
In Section
Seekers of Perl Wisdom