Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Reading output of external program without Shell

by Anonymous Monk
on Nov 15, 2017 at 10:29 UTC ( [id://1203454]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I am in need to read the output of an external exe program (pdftotxt.exe) and use it programmatically in my scripts. With the following script no problem at all:

use strict; use warnings; my $cmd = qq(pdftotext "a.pdf" - |); my $pid = open (FILE, $cmd) or die "Couldn't spawn [$cmd]: $! / $?"; print my $text = do { local($/); <FILE> }; close FILE;

The problem is that this works only if I run the script from the Shell (Windows 10). Now I need to find a way to run it whitout a Shell being there (I need to create myself a exe file and hide the Shell using PerlApp. In this case the produced exe doesn't work. No error message but $text stays empty.

Any suggestions?

Replies are listed 'Best First'.
Re: Reading output of external program without Shell
by haukex (Archbishop) on Nov 15, 2017 at 10:39 UTC

    One thing I see in your code is that you're not checking the return value of close. I wrote about the topic of running external commands at length here. From the several possibilities listed there, in this case I might suggest you try IPC::Run3 first, because it seems to work well on Windows. One thing to keep in mind is whether pdftotxt.exe is always going to be in your PATH environment variable, and if not, you should use absolute pathnames (using a suitable module for handling those, like File::Spec or Path::Class). Also, I am guessing that this program may have the ability to write its output to a file, in which case you might just want to have it do that, using File::Temp to generate suitable temp files, and then reading the file back in.

Re: Reading output of external program without Shell
by eyepopslikeamosquito (Archbishop) on Nov 15, 2017 at 12:29 UTC

    Inelegant in the extreme, but I dug out some old Windows-only code that I used years ago on an ancient Perl that still seems to work with latest Strawberry Perl.

    use strict; use warnings; sub read_file_contents { my $fname = shift; open( my $fh, '<', $fname ) or die "error: open '$fname': $!\n"; local $/ = undef; # slurp mode my $s = <$fh>; close($fh); return $s; } # Run a Windows executable synchronously. # Return a three element list: # the return code; the stdout of the command; and the stderr of the co +mmand. # Die if something goes wrong. sub run_cmd_sync { my ( $exe, $cmd, $workdir ) = @_; defined($workdir) or $workdir = "."; require Win32::Process; my $tmpout = "klink-out-$$.tmp"; my $tmperr = "klink-err-$$.tmp"; -f $exe or die "error: file '$exe' not found"; local *SAVOUT; local *SAVERR; # save original stdout and stderr open( SAVOUT, ">&STDOUT" ) or die "error: open SAVOUT: $!"; open( SAVERR, ">&STDERR" ) or die "error: open SAVERR: $!"; open( STDOUT, '>', $tmpout ) or die "error: can't redirect stdout"; open( STDERR, '>', $tmperr ) or die "error: can't redirect stderr"; Win32::Process::Create( my $hProc, # process object $exe, # executable $cmd, # command line 1, # inherit handles Win32::Process::NORMAL_PRIORITY_CLASS(), $workdir # working dir ) or die "error: Win32::Process::Create: $^E ($!)"; my $pid = $hProc->GetProcessID(); # parent continues (redirect back to original) ... close(STDOUT); close(STDERR); open( STDOUT, ">&SAVOUT" ) or die "error: open SAVOUT: $!"; open( STDERR, ">&SAVERR" ) or die "error: open SAVERR: $!"; print "started exe:$exe (cmd:$cmd) ok, pid=$pid.\n"; my $rc = 0; $hProc->Wait( Win32::Process::INFINITE() ) or die "error: Wait: $^E + ($!)"; $hProc->GetExitCode($rc) or die "error: GetExitCode: $^E ($!)"; my $outstr = read_file_contents($tmpout); my $errstr = read_file_contents($tmperr); unlink($tmpout) or die "error: unlink '$tmpout': $!\n"; unlink($tmperr) or die "error: unlink '$tmperr': $!\n"; return ( $rc, $outstr, $errstr ); } my ( $rc, $outstr, $errstr ) = run_cmd_sync( $^X, 'perl -e "print q{hello stdout}; print STDERR q{hello stderr}"', '.' ); print "rc='$rc'\n"; print "stdout='$outstr'\n"; print "stderr='$errstr'\n";
    Running the above program produces:
    started exe:C:\Strawberry\perl\bin\perl.exe (cmd:perl -e "print q{hell +o stdout}; print STDERR q{hello stderr}") ok, pid=3132. rc='0' stdout='hello stdout' stderr='hello stderr'

    In case it's of use, running this simpler and more portable test program tt1.pl works from the Windows shell at least, but will probably hit the same problems you are currently suffering if run without a shell.

    # Test program tt1.pl use strict; use warnings; # Run a command without invoking the command shell. # exe is the command name # @_ contains the command line arguments (including argv[0]) sub run_cmd_noshell { my $exe = shift; print "run '$exe' with args:\n '@_'\n"; system { $exe } @_; my $rc = $? >> 8; $rc == 0 or warn "error: exit code=$rc\n"; } run_cmd_noshell($^X, $^X, '-le', 'print q{hello one};'); run_cmd_noshell($^X, 'perl', '-le', 'print q{hello two}; exit 42;');
    produces the following output:
    run 'C:\Strawberry\perl\bin\perl.exe' with args: 'C:\Strawberry\perl\bin\perl.exe -le print q{hello one};' hello one run 'C:\Strawberry\perl\bin\perl.exe' with args: 'perl -le print q{hello two}; exit 42;' hello two error: exit code=42

Re: Reading output of external program without Shell
by salva (Canon) on Nov 15, 2017 at 11:15 UTC
    I faced a similar issue recently and my conclusion was that there is some bug in perl handling of pipes that triggers when it is called as a Windows application (instead of a console one).

    My advise would be to redirect the output of the external program to a external file and to read it afterwards from perl.

Re: Reading output of external program without Shell
by karlgoethebier (Abbot) on Nov 15, 2017 at 16:03 UTC

    Apropos pdftotext. What about CAM::PDF to extract the text? As far as i remember it comes with such a feature. Regards, Karl

    Update: Just found an example on my box. I didn't remember that it is so simple:

    #!/usr/bin/env perl use strict; use warnings; use CAM::PDF; use feature qw(say); my $file = shift; my $pdf = CAM::PDF->new($file); say $pdf->getPageText(1); __END__

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Hi karlgoethebier. I tried in the past CAM::PDF; . In the future I may opt for it, but at the moment I think the quality of output of pdftotext (from xpdf) is unbeatable (but I'll run some more tests)

        Thanks for your kind reply. BTW, as far as i vaguely remember this doesn't call a shell:

        #!/usr/bin/env perl use strict; use warnings; use IPC::Run qw( run ); use feature qw(say); my $command = q(cat); my $text = run qq($command $0); say $text; __END__

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Reading output of external program without Shell
by Anonymous Monk on Nov 15, 2017 at 13:24 UTC

    Thank you all!

    At the end I opted for writing the content of the PDF in a temp file and read it again. Not very elegant thought...

    use strict; use warnings; use File::Temp qw(tempfile); my $temp = new File::Temp( UNLINK => 0, SUFFIX => '.txt' ); system ("pdftotext","-enc", "UTF-8","myfile.pdf","$temp"); local $/=undef; open(my $fh, '<:encoding(UTF-8)', $temp) or die "Could not open file ' +$temp' $!"; print my $string = <$fh>;

      Good choice to use File::Temp and the list form of system, but don't forget to check its return value - like system(...)==0 or die "system failed, \$?=$?";

      Not very elegant thought...

      I think this is a case where reliability is elegant :-) Temp files are unlikely to have filename conflicts, are created in locations where they don't disturb the user (unlike some programs that create temp files in the user's home directory and/or use fixed names), and you get automatic cleanup (although I'm not sure why you set UNLINK=>0). Overall they're a good solution, and in fact IPC::Run3 makes heavy use of them, AFAIK it's one of the reasons it's so portable.

        Strongly agree – "it is elegant if it works, reliably and consistently, and does not take a lot of head-banging to develop." Such "inelegant" procedures also have the advantage of being easier to debug, or to change to meet future requirements, because they do employ a temporary file whose contents can be inspected. (And also because, "first, one step runs and runs to completion, then, the next step begins," and so on.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1203454]
Approved by marto
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-11-08 17:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    chatterbot is...






    Results (33 votes). Check out past polls.