Re^2: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 29, 2015 at 23:04 UTC
|
That's perfect!! Also, thanks for your advice concerning the use of WHILE in lieu of FOREACH! :-)) | [reply] |
Re^2: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 30, 2015 at 02:55 UTC
|
Could I please ask another question? After using "next unless -f $file" the program runs, but fails to execute anything thereafter. As a test, I inserted a simple PRINT statement immediately after the "next unless" statement, and received nothing. If I uncomment the "next if" statement, and omit the "next unless", then the simple PRINT statement works, but the program crashes trying to execute the write statement. In sum, it seems that the "next unless" filters out all obs. Make any sense?
#! /usr/bin/perl -w
use strict;
use warnings;
use lib "c:/strawberry/perl/site/lib";
use HTML::Strip;
my $hs = HTML::Strip->new();
my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\
+1993\Clean';
my $files_dir = 'C:\Dwimperl\Perl\1993';
opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di
+r' <$!>";
while (my $file = readdir($dir_handle) ) {
next unless -f $file;
#next if $file eq '.' or $file eq '..';
open my $file_handle, "/dwimperl/perl/1993/$file" or die "failed t
+o open '$file' <$!>";
while (my $line = <$file>) {
my $clean_text = $hs->parse( ' ' );
print $write_dir "$file\n";
$hs->eof;
}
}
close();
closedir $dir_handle;
| [reply] [d/l] |
|
...
my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\
+1993\Clean';
...
opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di
+r' <$!>";
while (my $file = readdir($dir_handle) ) {
...
open my $file_handle, "/dwimperl/perl/1993/$file" or die "failed
+to open '$file' <$!>";
while (my $line = <$file>) {
Actually use the file handle, not a file path, to read a line.
...
print $write_dir "$file\n";
...
The directory path is not a file handle but a string. If there is none such open file handle, print will fail. To write to a file for a specific file handle, open the file in write mode; use print FILEHANDLE LIST syntax; see print.
To copy or move files, see File::Copy. | [reply] [d/l] [select] |
|
Thank you! Apologize for the inconvenience.
| [reply] |
|
You are welcome. I was not inconvenienced to point out the errors. Acutally, OP's reply may not be a direct reply to me as it was reply to OP's own post. Then again, that might just be the case of not being familiar with perlmonks.
| [reply] |
|
On many systems, doing something to a file ... even, just opening it ... can interfere with a directory-scan, causing it to end prematurely, to list the same file more than once, and so on. (And this would be true no matter what high-level language e.g. Perl was being used to do it.)
Therefore, I suggest that you first retrieve the entire list of files into an in-memory list ... which you can very easily do in Perl just by using the list context. Then, iterate through the in-memory list that you have just retrieved, checking to see if they are or aren’t directories and so-on. Start and finish the task of retrieving the list, for any given directory that you are now “in” ... then process the list.
Of course, “file finding” is such a common requirement that there are many CPAN modules like File::Find. If you need to “take a walk through a directory tree,” there are plenty of tour-guides . . .
| |
Re^2: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 30, 2015 at 17:01 UTC
|
Could I ask another question, please? The code below runs, but fails to write/save the HTML-stripped text files. With a simple print statement, I've determined that the "second" WHILE statement must return FALSE, as the program never makes it this far. I am grateful for any insight!
#! /usr/bin/perl -w
use strict;
use warnings;
use lib "c:/strawberry/perl/site/lib";
use HTML::Strip;
my $hs = HTML::Strip->new();
#Where I will store the end results;
my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\
+1993\Clean';
#Where the files with the HTML tags are located;
my $files_dir = 'C:\Dwimperl\Perl\1993';
#Open the directory where the target files with HTML tags are located;
+
#Why am I doing this? Stores file names in a directory handle?
opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di
+r' <$!>";
#Loop through each entry/file in the directory;
#What is readdir doing here? It's not really reading anything;
#Is it simply advancing us to the next entry?;
#Seems like the real READ occurs via the OPEN statement below;
while (my $file = readdir($dir_handle) ) {
next unless -f $file;
#next if $file eq '.' or $file eq '..';
#Open the current file so I can strip the HTML tags ??? ;
open my $file_handle, '<', $file or die "failed to open '$file' <$
+!>";
#Read the current file one line at a time??;
while (my $line = <$file_handle>) {
########The WHILE statement above must return FALSE cuz the program ne
+ver makes it here;
#Strip the HTML tags??;
my $clean_text = $hs->parse( ' ' );
#Save the clean (no HTML tags) text file in a new file/locatio
+n??;
print $write_dir "$file\n";
$hs->eof;
}
}
close();
closedir $dir_handle;
| [reply] [d/l] |
|
#!perl
use strict;
use warnings;
my $files_dir = 'C:\Dwimperl\Perl\1993';
opendir (my $dir_handle, $files_dir);
while (my $filename = readdir($dir_handle)){
next unless -f $files_dir.'/'.$filename;
print "$filename\n";
}
poj
| [reply] [d/l] |
|
#!perl
use strict;
use warnings;
use HTML::Strip;
my $hs = HTML::Strip->new();
my $files_dir = 'C:\Dwimperl\Perl';
my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\
+1993\Clean';
opendir (my $dir_handle, $files_dir);
while (my $filename = readdir($dir_handle)){
next unless -f $files_dir.'/'.$filename;
print "Procesing $filename\n";
open my $fh_in, '<', $files_dir.'/'.$filename
or die "failed to open '$filename' for read";
open my $fh_out, '>', $write_dir.'/'.$filename
or die "failed to open '$filename' for write";
my $count=0;
while (my $line = <$fh_in>) {
my $clean_text = $hs->parse($line);
print $fh_out "$clean_text\n";
++$count;
}
$hs->eof;
print "$count lines read from $filename\n;"
}
poj | [reply] [d/l] |
|
We're close, writes the files to output location, but the files are empty (size 0 kb). Ideas?
| [reply] |
|
Works! Very grateful for you time and patience with me. You're the best!
| [reply] |
|
Hi poj, your script will print the file names. Where are we going here?
| [reply] |
|
|
Hi poj, corrected a couple of stupid things on my part (e.g., ensuring my portable hard drive is available/plugged in, and actually opening the output file for output). Now gives me a "failed to open" for the output file at line 12. Here is the revised code. I apologize for the hassle.
#! /usr/bin/perl -w
use strict;
use warnings;
use lib "c:/strawberry/perl/site/lib";
use HTML::Strip;
my $hs = HTML::Strip->new();
#Where I will store the end results;
my $write_dir = 'F:\research\sec filings 10k and 10Q\data\filing docs\
+1993\Clean';
open (my $outfile_hand, '>', $write_dir) || die "failed to open '$writ
+e_dir' <$!>";
#Where the files with the HTML tags are located;
my $files_dir = 'C:\Dwimperl\Perl';#\1993';
#Open the directory where the target files with HTML tags are located;
+
#Why am I doing this? Stores file names in a directory handle?
opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di
+r' <$!>";
#Loop through each entry/file in the directory;
#What is readdir doing here? It's not really reading anything;
#Is it simply advancing us to the next entry?;
#Seems like the real READ occurs via the OPEN statement below;
while (my $file = readdir($dir_handle) ) {
next unless -f $file;
#next if $file eq '.' or $file eq '..';
#Open the current file so I can strip the HTML tags ??? ;
open my $file_handle, '<', $file or die "failed to open '$file' <$
+!>";
#Read the current file one line at a time??;
while (my $line = <$file_handle>) {
########The WHILE statement above must return FALSE cuz the program ne
+ver makes it here;
#Strip the HTML tags??;
my $clean_text = $hs->parse( ' ' );
#Save the clean (no HTML tags) text file in a new file/locatio
+n??;
print $outfile_hand "$file\n";
$hs->eof;
}
}
close();
closedir $dir_handle;
| [reply] [d/l] |
|