by warmsuns (Initiate)
on Feb 21, 2013 at 20:54 UTC
warmsuns has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use an executable file peftotext to change a pdf file to a text format ,but I don't know why it doesn't work .and by the way ,I am a new learner,I just don't know "-raw" means what.Thanks a lot!

#printf("hi, this is perl pdf_extractor1.\n"); $filename1 = $ARGV[0]; #printf("hi, this is FILENAME $filename1.\n"); $filename1 =~ s/\.pdf|\.PDF/\.txt/; #printf("1, this is FILENAME $filename1.\n"); system("./xpdf-2.01/xpdf/pdftotext -raw $ARGV[0]");

then ,I tried to run the pdftotext directly under shell ,it turned out i have to move into the directory where the pdftotext file is in ,can I make it successfully.I am confused:(

yao@ubuntu:~/perl$ xpdf-2.01/xpdf/pdftotext 1.pdf bash: xpdf-2.01/xpdf/pdftotext: Permission denied yao@ubuntu:~/perl$ xpdf-2.01/xpdf/pdftotext 2.pdf bash: xpdf-2.01/xpdf/pdftotext: Permission denied yao@ubuntu:~/perl$ cd xpdf-2.01 yao@ubuntu:~/perl/xpdf-2.01$ cd xpdf yao@ubuntu:~/perl/xpdf-2.01/xpdf$ pdftotext /home/yao/perl/1.pdf

Re: no responses for the execution of the code
by vinoth.ree (Monsignor) on Feb 22, 2013 at 05:39 UTC

    Hi warmsuns, as you need to convert the pdf to text you can use the existing CPAN modules




    Untested code:

    use strict; use warnings; use utf8; use CAM::PDF; use CAM::PDF::PageText; my $FileName = shift || die "Usage:Command line argument Missing. Give + a PDF file as argument\n"; my $Pdf_Obj = CAM::PDF->new($FileName); print text_from_page(1); sub Text_From_Page { my $pg_num = shift; return CAM::PDF::PageText->render($pdf->getPageContentTree($pg_num +)); }

      Thank you so much .I didn't expect so many responses form the forum. it feels so good!I am a thesis student in computer science ,and my professor asked my to study the perl code some students did before the purpose of which is to get the semantic head from a paper.It takes time to read what other has already finished and I have so many questions just don't know where to go for help:) Thanks again for all of you !

Re: no responses for the execution of the code
by aitap (Deacon) on Feb 23, 2013 at 16:30 UTC

    This file cannot be run either because it doesn't have the execution bit (chmod +x it then) or the filesystem is mounted with noexec option (remount it with 'exec' option or move the file to another filesystem).

    This file (from the last command) is ran from your $PATH, not from your current directory.

    Sorry if my advice was wrong.
Re: no responses for the execution of the code
by Anonymous Monk on Feb 22, 2013 at 01:56 UTC


    Look up chdir in the perl docs.

    Put the source file into the same directory as the pdftotext executable. put chdir into youtr code so that you move to the same directory as the executable, always check the return code from chdir.

    This makes it simple to test when you are starting, but it probably isn't the best way to do things in Production.


