Re: Read files not subdirectories

Replies are listed 'Best First'.
Re^2: Read files not subdirectories by wrkrbeee (Scribe) on Jan 29, 2015 at 23:04 UTC
That's perfect!! Also, thanks for your advice concerning the use of WHILE in lieu of FOREACH! :-))	[reply]
Re^2: Read files not subdirectories by wrkrbeee (Scribe) on Jan 30, 2015 at 02:55 UTC
Could I please ask another question? After using "next unless -f $file" the program runs, but fails to execute anything thereafter. As a test, I inserted a simple PRINT statement immediately after the "next unless" statement, and received nothing. If I uncomment the "next if" statement, and omit the "next unless", then the simple PRINT statement works, but the program crashes trying to execute the write statement. In sum, it seems that the "next unless" filters out all obs. Make any sense? #! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; use HTML::Strip; my $hs = HTML::Strip->new(); my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; my $files_dir = 'C:\Dwimperl\Perl\1993'; opendir (my $dir_handle, $files_dir) \|\| die "failed to open '$files_di +r' <$!>"; while (my $file = readdir($dir_handle) ) { next unless -f $file; #next if $file eq '.' or $file eq '..'; open my $file_handle, "/dwimperl/perl/1993/$file" or die "failed t +o open '$file' <$!>"; while (my $line = <$file>) { my $clean_text = $hs->parse( ' ' ); print $write_dir "$file\n"; $hs->eof; } } close(); closedir $dir_handle; [download]	[reply] [d/l]
Re^3: Read files not subdirectories by parv (Parson) on Jan 30, 2015 at 03:17 UTC
Consult a beginner level Perl book ("Beginner Perl" for an example) to understand difference between file and file handle; currently selected file handle for `print` & its various forms. `... my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; ... opendir (my $dir_handle, $files_dir) \|\| die "failed to open '$files_di +r' <$!>"; while (my $file = readdir($dir_handle) ) { ... open my $file_handle, "/dwimperl/perl/1993/$file" or die "failed +to open '$file' <$!>"; while (my $line = <$file>) {` [download] Actually use the file handle, not a file path, to read a line. `... print $write_dir "$file\n"; ...` [download] The directory path is not a file handle but a string. If there is none such open file handle, print will fail. To write to a file for a specific file handle, open the file in write mode; use `print FILEHANDLE LIST` syntax; see print. To copy or move files, see File::Copy.	[reply] [d/l] [select]
Re^3: Read files not subdirectories by wrkrbeee (Scribe) on Jan 30, 2015 at 03:24 UTC
Thank you! Apologize for the inconvenience.	[reply]
Re^4: Read files not subdirectories by parv (Parson) on Jan 30, 2015 at 03:27 UTC
~~You are welcome. I was not inconvenienced to point out the errors.~~ ~~Acutally, OP's reply may not be a direct reply to me as it was reply to OP's own post. Then again, that might just be the case of not being familiar with perlmonks.~~	[reply]
Re^3: Read files not subdirectories by sundialsvc4 (Abbot) on Jan 30, 2015 at 15:15 UTC
On many systems, doing something to a file ... even, just opening it ... can interfere with a directory-scan, causing it to end prematurely, to list the same file more than once, and so on. (And this would be true no matter what high-level language e.g. Perl was being used to do it.) Therefore, I suggest that you first retrieve the entire list of files into an in-memory list ... which you can very easily do in Perl just by using the list context. Then, iterate through the in-memory list that you have just retrieved, checking to see if they are or aren’t directories and so-on. Start and finish the task of retrieving the list, for any given directory that you are now “in” ... then process the list. Of course, “file finding” is such a common requirement that there are many CPAN modules like File::Find. If you need to “take a walk through a directory tree,” there are plenty of tour-guides . . .
Re^2: Read files not subdirectories by wrkrbeee (Scribe) on Jan 30, 2015 at 17:01 UTC
Could I ask another question, please? The code below runs, but fails to write/save the HTML-stripped text files. With a simple print statement, I've determined that the "second" WHILE statement must return FALSE, as the program never makes it this far. I am grateful for any insight! #! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; use HTML::Strip; my $hs = HTML::Strip->new(); #Where I will store the end results; my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; #Where the files with the HTML tags are located; my $files_dir = 'C:\Dwimperl\Perl\1993'; #Open the directory where the target files with HTML tags are located; + #Why am I doing this? Stores file names in a directory handle? opendir (my $dir_handle, $files_dir) \|\| die "failed to open '$files_di +r' <$!>"; #Loop through each entry/file in the directory; #What is readdir doing here? It's not really reading anything; #Is it simply advancing us to the next entry?; #Seems like the real READ occurs via the OPEN statement below; while (my $file = readdir($dir_handle) ) { next unless -f $file; #next if $file eq '.' or $file eq '..'; #Open the current file so I can strip the HTML tags ??? ; open my $file_handle, '<', $file or die "failed to open '$file' <$ +!>"; #Read the current file one line at a time??; while (my $line = <$file_handle>) { ########The WHILE statement above must return FALSE cuz the program ne +ver makes it here; #Strip the HTML tags??; my $clean_text = $hs->parse( ' ' ); #Save the clean (no HTML tags) text file in a new file/locatio +n??; print $write_dir "$file\n"; $hs->eof; } } close(); closedir $dir_handle; [download]	[reply] [d/l]
Re^3: Read files not subdirectories by poj (Abbot) on Jan 30, 2015 at 17:31 UTC
Is your script located in the same folder as the html files ?. If not add the directory to get the full path like this `#!perl use strict; use warnings; my $files_dir = 'C:\Dwimperl\Perl\1993'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)){ next unless -f $files_dir.'/'.$filename; print "$filename\n"; }` [download] poj	[reply] [d/l]
Re^4: Read files not subdirectories by poj (Abbot) on Jan 30, 2015 at 18:05 UTC
I'm guessing you want to process each line and write it out (untested) #!perl use strict; use warnings; use HTML::Strip; my $hs = HTML::Strip->new(); my $files_dir = 'C:\Dwimperl\Perl'; my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)){ next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { my $clean_text = $hs->parse($line); print $fh_out "$clean_text\n"; ++$count; } $hs->eof; print "$count lines read from $filename\n;" } [download] poj	[reply] [d/l]
Re^4: Read files not subdirectories by wrkrbeee (Scribe) on Jan 30, 2015 at 18:22 UTC
We're close, writes the files to output location, but the files are empty (size 0 kb). Ideas?	[reply]
Re^4: Read files not subdirectories by wrkrbeee (Scribe) on Jan 30, 2015 at 18:27 UTC
Works! Very grateful for you time and patience with me. You're the best!	[reply]
Re^4: Read files not subdirectories by wrkrbeee (Scribe) on Jan 30, 2015 at 17:44 UTC
Hi poj, your script will print the file names. Where are we going here?	[reply]
Re^5: Read files not subdirectories by poj (Abbot) on Jan 30, 2015 at 18:09 UTC
Re^4: Read files not subdirectories by wrkrbeee (Scribe) on Jan 30, 2015 at 18:05 UTC
Hi poj, corrected a couple of stupid things on my part (e.g., ensuring my portable hard drive is available/plugged in, and actually opening the output file for output). Now gives me a "failed to open" for the output file at line 12. Here is the revised code. I apologize for the hassle. #! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; use HTML::Strip; my $hs = HTML::Strip->new(); #Where I will store the end results; my $write_dir = 'F:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; open (my $outfile_hand, '>', $write_dir) \|\| die "failed to open '$writ +e_dir' <$!>"; #Where the files with the HTML tags are located; my $files_dir = 'C:\Dwimperl\Perl';#\1993'; #Open the directory where the target files with HTML tags are located; + #Why am I doing this? Stores file names in a directory handle? opendir (my $dir_handle, $files_dir) \|\| die "failed to open '$files_di +r' <$!>"; #Loop through each entry/file in the directory; #What is readdir doing here? It's not really reading anything; #Is it simply advancing us to the next entry?; #Seems like the real READ occurs via the OPEN statement below; while (my $file = readdir($dir_handle) ) { next unless -f $file; #next if $file eq '.' or $file eq '..'; #Open the current file so I can strip the HTML tags ??? ; open my $file_handle, '<', $file or die "failed to open '$file' <$ +!>"; #Read the current file one line at a time??; while (my $line = <$file_handle>) { ########The WHILE statement above must return FALSE cuz the program ne +ver makes it here; #Strip the HTML tags??; my $clean_text = $hs->parse( ' ' ); #Save the clean (no HTML tags) text file in a new file/locatio +n??; print $outfile_hand "$file\n"; $hs->eof; } } close(); closedir $dir_handle; [download]	[reply] [d/l]
Re^5: Read files not subdirectories by poj (Abbot) on Jan 30, 2015 at 18:26 UTC


Perl-Sensitive Sunglasses
	PerlMonks