![]() |
|
Come for the quick hacks, stay for the epiphanies. | |
PerlMonks |
Re: Process Text File and Write to Databaseby graff (Chancellor) |
on Nov 21, 2009 at 18:05 UTC ( [id://808608]=note: print w/replies, xml ) | Need Help?? |
You may want to try grabbing the full HTML data for the page, and using a parser module on that (HTML::Parser or cpan::/HTML::TokeParser), in case the markup in the web page provides some structural information that you can use (like record boundaries and field labels).
On the other hand, if the blank lines that you are throwing away happen to represent boundaries between records, you should be using them as record separators, rather than throwing them away. Look up the section in the perlvar documentation about $INPUT_RECORD_SEPARATOR ($/) -- if blank lines are used only at record boundaries, then setting $/=""; (empty string) causes perl to read a complete, multi-line record on each iteration of while(<>){...}. Apart from that, you should be using placeholders in your insert statement -- prepare it once (before the loop) and execute it repeatedly (in the loop); this makes the "quote()"-ing of values unnecessary. In case it's true that blank lines in the data represent record boundaries, here's an example of how it could work: (not tested, but it compiles, and the sql statement comes out right) If the copy/pasted text contains "extra" blank lines within records, the simple paragraph-mode approach above won't work. Try to find some other reliable indicator of record boundaries and use that instead, then remove the blank lines by just altering that grep statement a bit:
In Section
Seekers of Perl Wisdom
|
|