I'm guessing that was directed towards me.
Apologies for not answering sooner, I just found this post now.
Anyway, here's the code (keep in mind that I'm by no means an experienced coder... also, this code was taken from a much larger project, so some of the silly variable names have a logic you can't see here. It's also missing some logging and reporting I left out etc. I have only tested this on Windows, but it's designed to work on most linux distros and OSX as well.):
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec;
# IDENTIFY OS
my $OS;
if ($^O =~ /mswin/i) {$OS = "Windows";print "OS detected: Windows\n"}
elsif ($^O =~ /linux/i) {$OS = "Linux";print "OS detected: Linux\n"}
elsif ($^O =~ /darwin/i) {$OS = "Mac";print "OS detected: Mac OS X\n"}
+
else {print "\nUnable to detect OS type, choose your OS:\n\nWindows
+ Any version of Microsoft Windows\nMac Any flavour of Mac OS X\nLi
+nux Linux of some sort\n\n";
do {
chomp ($OS = <STDIN>);
print "\nIncorrect OS type. Try again.\n\n" unless $OS eq "Windows
+" or $OS eq "Mac" or $OS eq "Linux";} until ($OS eq "Windows" or $OS
+eq "Mac" or $OS eq "Linux");
}
# IDENTIFY SCRIPT PATH
my $script = File::Spec->rel2abs( __FILE__ );
$script =~ /(.*)[\/|\\](.*)/;
my $scriptpath = $1;
if (-d "$scriptpath/scripts/docx2txt") {
# print "\nScript folder found.\n";# comment out
} else {
do {
print "\nThe script path found automatically (${scriptpath}) i
+s not correct.\nPlease drag and drop the aligner script here and pres
+s enter. (If your OS doesn't support drag & drop, copy-paste the path
+ here. You can paste by right clicking in the window or right clickin
+g the icon in the top left corner of this window.)\n";
chomp ($script = <STDIN>);
$script =~ / *[\"\'](.*)[\/\\](.*)[\"\'] */;
$scriptpath = $1;
$scriptpath =~ s/^\s+//; # strip leading wh
+itespace
$scriptpath =~ s/\s+$//; # strip trailing w
+hitespace
if (-e "$scriptpath/scripts/docx2txt") {print "\nScript folder
+ identified correctly.\n"}
} until (-e "$scriptpath/scripts/docx2txt");
}
# DRAG AND DROP INPUT FILE
my $file1_full;
print "\n\nDrag and drop your input file here and press enter.\n";
chomp ($file1_full = <STDIN>);
$file1_full =~ s/^\s+//; # strip leading whitespace
$file1_full =~ s/\s+$//; # strip trailing whitespac
+e
$file1_full =~ /^[\"\']?(.*)[\/\\]([^\"\']*)[\"\']?$/;
my $folder = $1;
my $file1 = $2;
$file1 =~ /(.*)\.(.*)/;
my $f1 = $1;
my $ext = lc($2);
# CONVERT DOCX TO UTF-8 TXT
if ($OS eq "Windows") {
# create config file, run docx2txt.exe modded to use win config fi
+le
open (DOCX2TXTCONFIG, "<", "$scriptpath/scripts/docx2txt/docx2txt.
+config") or die "Can't open file: $!";
unlink "$scriptpath/scripts/docx2txt/docx2txt_win.config";
open (DOCX2TXTCONFIG_WIN, ">>", "$scriptpath/scripts/docx2txt/docx
+2txt_win.config") or die "Can't open file: $!";
while (<DOCX2TXTCONFIG>) {
s/^unzip *=>.*$/unzip => \'$scriptpath\\scripts\\docx2
+txt\\unzip\\unzip\.exe\',/;
print DOCX2TXTCONFIG_WIN $_;
}
close DOCX2TXTCONFIG;
close DOCX2TXTCONFIG_WIN;
system ("\"$scriptpath\\scripts\\docx2txt\\docx2txt_win.exe\" \"$f
+older/$file1\" \"$folder/${f1}.txt\"");
} else { # linux and mac both use the original docx2txt.pl and both ha
+ve unzip at usr/bun/unzip
system ("perl \"$scriptpath/scripts/docx2txt/docx2txt.pl\" \"$fold
+er/$file1\" \"$folder/${f1}.txt\"");
}
#work with the txt file from now on
$file1 = "${f1}.txt";
# CHECK FILE SIZE, ABORT IF 0
my $file_1_size = -s "$folder/$file1";
if ($file_1_size == 0) {
print "\n\nThe file conversion seems to have failed: the generated
+ file is empty. ABORTING.\n\n";
sleep 3;
die;
}
# DONE
print "\n$file1 created ($file_1_size bytes).\nPress enter to quit.\n"
+;
<STDIN>;
Now, this requires docx2txt.pl for *nix, and docx2txt.exe and unzip.exe on windows. It looks for these in scripts/docx2txt, I have uploaded the necessary files here. Of course you can get your own unzip binary and generate docx2txt.exe yourself with pp, which is what I did, or just use the .pl on Windows as well if your users can be expected to have perl installed.
The file won't be up here for long, so here's a summary in case someone reads this when I've already yanked it:
Docx2txt needs to unzip the docx (zip) files. To make this work on Windows, I have modded the original perl script to use a different config file (docx2txt_win.config) which the main script generates at runtime, filling in the path to unzip.exe (scripts/doxc2txt/unzip) according to what folder it's in. Then I generated an executable (docx2txt_win.exe) out of this slightly modified script.
On Linux and OS X systems, the original .pl is used without modifications as these OSes can reasonably be expected to have an unzip utility at usr/bin/unzip. | [reply] [d/l] |