Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Extracting text from pptx

by welle (Beadle)
on Mar 08, 2013 at 18:46 UTC ( #1022478=note: print w/ replies, xml ) Need Help??


in reply to Re: Extracting text from pptx
in thread Extracting text from pptx

Please, consider that I'm a novice with Perl

I just can't understand the whole script: where is the unzipping part? If I run the script (on Windows) I simply get an error message (System not able to find the directory + Failed to extract required information from <file>). It must - I guess - with the script setting:

my $unzip = "/usr/bin/unzip"; to do


Comment on Re^2: Extracting text from pptx
Download Code
Re^3: Extracting text from pptx
by Corion (Pope) on Mar 08, 2013 at 19:12 UTC

    So, what have you done to find out where $unzip is then used?

    Also, the error message would suggest to me that somewhere, the program expects some other program, possibly unzip.exe to exist. What have you done to find out whether that is really the case?

    Likely an unzip utility can be found in the unxutils package.

Re^3: Extracting text from pptx
by jms53 (Monk) on Mar 09, 2013 at 03:44 UTC
    that looks like an Unix comand!
    (ok, that IS an unix path to a program, you'll want to find the call for something like 7zip or the likes and replace  "/usr/bin/unzip/ it with that).
    J -

      Here's a one-liner for *nix

      unzip -lp <filename>.pptx ppt/slides/* ppt/notesSlides/* | perl -wne' +while ( /<a:t>(.*?)<\/a:t>.*?(?=(<a:t>|<\/p:txBody>))/g ) {print "$1" + and print "\n" if $2 eq "<\/p:txBody>"}'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1022478]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-07-12 22:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (241 votes), past polls