spickles has asked for the wisdom of the Perl Monks concerning the following question:
Monks -
I've written a script to read in a list of nursing homes that was copied and pasted off the web. I pull this file in and write an output file that removes empty lines. I then pull the output file back in, and now I want to step through it and write certain elements to a database. There are three lines that contain information that I want to skip, as shown by the matches. I've been doing a lot of troubleshooting and can't seem to figure out where I've gone wrong. When I run the code, I get no errors, but no data in the database either.
#!c:/xampp/perl/bin/perl
use strict;
use warnings;
use DBI;
use dbConnect_nursing;
sub printVariables(@_)
{
foreach my $variable (@_)
{
print $variable . "\n";
}
print "###############################################\n";
}
sub processLine
{
my @temp;
chomp($_[0]);
unshift(@temp, $_[0]);
my $var = shift(@temp);
return $var;
}
################################### connect to the database ##########
+##################################
# data source name
my $dsn = "DBI:$dbConnect::db_platform:$dbConnect::db_database:$dbConn
+ect::db_host:$dbConnect::db_port";
# perl DBI connect
my $connect = DBI->connect($dsn, $dbConnect::db_user, $dbConnect::db_p
+w, {'RaiseError' => 1});
################################### connect to the database ##########
+##################################
my @file_array;
my $in_file = "c:\\nursing_homes.txt";
my $out_file = "c:\\nursing_homes_out.txt";
if (-e $out_file)
{
unlink $out_file;
}
open INPUT,'<',$in_file or die "Can't open file " . $in_file . "\n$!\n
+"; #Open for read
open OUTPUT,'>',$out_file or die "Can't open file " . $out_file . "\n$
+!\n"; #Open for write
while (<INPUT>)
{
chomp ($_);
next if $_ =~ /^\s*$/; # skip over blank lines
print OUTPUT $_ . "\n";
}
close INPUT;
close OUTPUT;
open INPUT,'<',$out_file or die "Can't open file " . $out_file . "\n$!
+\n"; #Open for read
while (<INPUT>)
{
my $name = $connect->quote($_);
next;
my $address1 = $connect->quote($_);
next;
my $address2 = $connect->quote($_);
next;
my $phone = $connect->quote($_);
next;
next if (($_ =~ /^.*Council.*$/) || ($_ =~ /^Continuing.*$/) |
+| ($_ =~ /^Mapping.*$/));
next if (($_ =~ /^.*Council.*$/) || ($_ =~ /^Continuing.*$/) |
+| ($_ =~ /^Mapping.*$/));
next if (($_ =~ /^.*Council.*$/) || ($_ =~ /^Continuing.*$/) |
+| ($_ =~ /^Mapping.*$/));
my $overall = $connect->quote($_);
next;
my $inspections = $connect->quote($_);
next;
my $staffing = $connect->quote($_);
next;
my $quality = $connect->quote($_);
next;
my $programs = $connect->quote($_);
next;
my $beds = $connect->quote($_);
next;
my $ownership = $connect->quote($_);
next;
my $query_string = "INSERT INTO nursing_homes (name, address1,
+ address2, phone, overall, inspections, staffing, quality, programs,
+beds, ownership) VALUES ($name, $address1, $address2, $phone, $overal
+l, $inspections, $staffing, $quality, $programs, $beds, $ownership)";
#printVariables($name, $address1, $address2, $phone, $overall,
+ $inspections, $staffing, $quality, $programs, $beds, $ownership, $qu
+ery_string);
my $query_handle = $connect->prepare("INSERT INTO nursing_home
+s (name, address1, address2, phone, overall, inspections, staffing, q
+uality, programs, beds, ownership) VALUES ($name, $address1, $address
+2, $phone, $overall, $inspections, $staffing, $quality, $programs, $b
+eds, $ownership)");
$query_handle->execute();
}
close INPUT;
$connect->disconnect();
__END__
Sample data is below:
AARON MANOR REHABILITATION & NURSING CENTER
100 ST CAMILLUS WAY
FAIRPORT, NY 14450
(585) 377-4000
Resident Council
Mapping & Directions
4 out of 5 stars
4 out of 5 stars
3 out of 5 stars
4 out of 5 stars
Medicare and Medicaid
140
For profit - Corporation
ABSOLUT CTR FOR NURSING & REHAB ALLEGANY LLC
2178 NORTH FIFTH STREET
ALLEGANY, NY 14706
(716) 373-2238
Resident & Family Councils
Mapping & Directions
3 out of 5 stars
4 out of 5 stars
1 out of 5 stars
4 out of 5 stars
Medicare and Medicaid
37
For profit - Corporation
ABSOLUT CTR FOR NURSING & REHAB AURORA PARK LLC
292 MAIN STREET
EAST AURORA, NY 14052
(716) 652-1560
Resident Council
Mapping & Directions
1 out of 5 stars
1 out of 5 stars
2 out of 5 stars
4 out of 5 stars
Medicare and Medicaid
320
For profit - Corporation
ABSOLUT CTR FOR NURSING & REHAB DUNKIRK LLC
447 449 LAKE SHORE DRIVE WEST
DUNKIRK, NY 14048
(716) 366-6710
Resident Council
Mapping & Directions
1 out of 5 stars
2 out of 5 stars
1 out of 5 stars
2 out of 5 stars
Medicare and Medicaid
40
For profit - Corporation
Re: Process Text File and Write to Database
by johngg (Canon) on Nov 20, 2009 at 21:27 UTC
|
Others have pointed out your misunderstanding of next. Here are some other pointers.
You could use join and the string multiplier (see Multiplicative Operators in perlop) to save a lot of typing in your printVariables subroutine.
$ perl -e '
> sub printVariables
> {
> print join qq{\n}, @_, q{#} x 10, q{};
> }
>
> $v1 = 123;
> $v2 = 456;
> printVariables( $v1, $v2 );
>
> @arr = qw{ pete john mike };
> printVariables( @arr );'
123
456
##########
pete
john
mike
##########
$
You don't seem to call it but your processLine subroutine goes a very long way around the houses to achieve the same result as a
chomp $line;
in the body of your code would have done.
You don't need to unlink a pre-existing file if you are about to open it for writing.
You open "c:\\nursing_homes.txt" for reading and process it to remove blank lines writing the changes to "c:\\nursing_homes_out.txt" which you then re-open and read in your database insertion loop. Unless you need that processed file elsewhere, why bother? Just work on the original file in your main database insertion loop and include the next if $_ =~ /^\s*$/; line there.
Why do you initialse $query_string but not use it when doing the my $query_handle = $connect->prepare( ... ); line instead of re-typing exactly the same code again? Seems a bit wasteful of effort to me.
Rather than using concatenation
... die "Can't open file " . $out_file . "\n$!\n";
just interpolate into the string as you've already done with the $! variable
... die "Can't open file $out_file\n$!\n";
I hope these point are helpful.
Update: Corrected cut'n'paste error where I'd copied an earlier piece of test code with a shorter subroutine name in the call, pvar rather than printVariables
| [reply] [d/l] [select] |
Re: Process Text File and Write to Database
by toolic (Bishop) on Nov 20, 2009 at 18:37 UTC
|
It looks like you never call $query_handle->execute();
because of the unconditional next; in your while loop:
while (<INPUT>)
{
my $name = $connect->quote($_);
next;
That first connect is the only one that ever gets called in your loop. That doesn't seem right. | [reply] [d/l] [select] |
Re: Process Text File and Write to Database
by keszler (Priest) on Nov 20, 2009 at 18:39 UTC
|
Your second while loop is effectively:
open INPUT,'<',$out_file or die "Can't open file " . $out_file . "\n$!
+\n"; #Open for read
while (<INPUT>)
{
my $name = $connect->quote($_);
next;
}
See next | [reply] [d/l] |
|
I call the execute at the bottom. It looks like I don't understand the use of 'next' in this context, so I'm open to other suggestions as to how to read that file line by line and add the records to the database. When I reach a line that contains the word 'profit' or 'Government', that is the end of a record. Within a record, I need to skip lines that match 'Councils', 'Continuing', and 'Mapping' as these are irrelevant lines of data. The problem is that not all records contain all three of those lines, so I can't just arbitrarily increase a counter. I was having a good deal of success reading the file to an array and then using a 'for' loop, but I stumble across the cases where I need to toss out those irrelevant lines.
| [reply] |
|
It looks like I don't understand the use of 'next' in this context,
Yes. 'next' means "exit the block I'm in now, and skip anything (other than a continue block) afterwards."
Every time you called 'next', your program skipped directly to the next iteration of the loop without going any further - in your original program, that's why only the
my $name = $connect->quote($_);
next;
parts of the loop were being executed. | [reply] [d/l] |
|
This is quick, ugly, and prone to fail at the slightest change in the data, but maybe it'll get you started:
open INPUT,'<',$out_file or die "Can't open file " . $out_file . "\n$!
+\n"; #Open for read
while (<INPUT>)
{
my $name = $connect->quote($_);
my $address1 = $connect->quote(<INPUT>);
my $address2 = $connect->quote(<INPUT>);
my $phone = $connect->quote(<INPUT>);
for (<INPUT>) {
next if /(Council|^Continuing|^Mapping)/;
last;
}
my $overall = $connect->quote($_);
my $inspections = $connect->quote(<INPUT>);
my $staffing = $connect->quote(<INPUT>);
my $quality = $connect->quote(<INPUT>);
my $programs = $connect->quote(<INPUT>);
my $beds = $connect->quote(<INPUT>);
my $ownership = $connect->quote(<INPUT>);
my $query_string = "INSERT INTO nursing_homes (name, address1,
+ address2, phone, overall, inspections, staffing, quality, programs,
+beds, ownership) VALUES ($name, $address1, $address2, $phone, $overal
+l, $inspections, $staffing, $quality, $programs, $beds, $ownership)";
#printVariables($name, $address1, $address2, $phone, $overall,
+ $inspections, $staffing, $quality, $programs, $beds, $ownership, $qu
+ery_string);
my $query_handle = $connect->prepare("INSERT INTO nursing_home
+s (name, address1, address2, phone, overall, inspections, staffing, q
+uality, programs, beds, ownership) VALUES ($name, $address1, $address
+2, $phone, $overall, $inspections, $staffing, $quality, $programs, $b
+eds, $ownership)");
$query_handle->execute();
}
You define $query_string but don't use it, and you're not checking return values for the prepare and execute calls. You must. | [reply] [d/l] |
Re: Process Text File and Write to Database
by graff (Chancellor) on Nov 21, 2009 at 18:05 UTC
|
You may want to try grabbing the full HTML data for the page, and using a parser module on that (HTML::Parser or cpan::/HTML::TokeParser), in case the markup in the web page provides some structural information that you can use (like record boundaries and field labels).
On the other hand, if the blank lines that you are throwing away happen to represent boundaries between records, you should be using them as record separators, rather than throwing them away. Look up the section in the perlvar documentation about $INPUT_RECORD_SEPARATOR ($/) -- if blank lines are used only at record boundaries, then setting $/=""; (empty string) causes perl to read a complete, multi-line record on each iteration of while(<>){...}.
Apart from that, you should be using placeholders in your insert statement -- prepare it once (before the loop) and execute it repeatedly (in the loop); this makes the "quote()"-ing of values unnecessary.
In case it's true that blank lines in the data represent record boundaries, here's an example of how it could work:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect( " ...whatever... " );
my @insert_fields = qw{
name address1 address2 phone
overall inspections staffing quality
programs beds ownership
};
my $insert_sql = 'insert into nursing homes ('.
join( ', ', @insert_fields ).
') values ('.
join( ', ', ('?') x @insert_fields ).
')';
my $insert_sth = $dbh->prepare( $insert_sql );
$/ = ""; # set input_record_separator to empty string (paragraph mode
+)
# just put the input file name on the command line when running the sc
+ript
# (or pipe the data to the script's STDIN)
while (<>) # each iteration reads up to a blank line
{
my @lines = grep !/ Councils?$|^Mapping|^Continuing/, split( /[\r\
+n]+/ );
if ( @lines != @insert_fields ) { # skip records that won't work
print "Record # $. has wrong number of fields:\n$_\n";
next; # if you redirect STDOUT to a file, you can deal with
+these later
}
$insert_sth->execute( @lines );
}
$insert_sth->finish;
$dbh->disconnect;
(not tested, but it compiles, and the sql statement comes out right)
If the copy/pasted text contains "extra" blank lines within records, the simple paragraph-mode approach above won't work. Try to find some other reliable indicator of record boundaries and use that instead, then remove the blank lines by just altering that grep statement a bit:
@lines = grep !/^\s*$| Councils?$|^Mapping|^Continuing/, split( /[\r\
+n]+/ );
| [reply] [d/l] [select] |
|
|