First off, very nice how you improved on the insert query, for example when building the
$field_placeholders.
In order to get the relations properly, you'll learn about another nifty feature of perl: hashes. A hash is an unsorted list of key-value pairs.
We're going to loop through the files, accumulating the data per file in
$hr_output and accumulating all per-file data in
$ar_data.
The prefixes hr_ and ar_ are not required by perl, but they help me to distinguish between references to hashes (hashref, or hr) and references to arrays (arrayref, or ar).
my $ar_data = [];
#Opening, reading, and extracting column content of each concord.n.txt
+ file
foreach my $output_concord_file (@output_concord_files){
#Note : for the concord file, no processing implied -> stored by d
+efault in $output_concord_file
open (my $fh, '<:utf8', "$output_concord_file") || die "Couldn't o
+pen $output_concord_file : $!";
my $hr_output = {};
$hr_output->{concord_file} = $output_concord_file;
while (<$fh>){
if ($_ =~ /=\[=(.*)=\]=/){
$hr_output->{url} = $1;
}
if ($_ =~ /=\[\+(.*)\+\]=/){
$hr_output->{sys_time} = $1;
}
if ($_ =~ /=E-(.*)=event/){
push(@{$hr_output->{events}}, $1);
}
}
push(@{$ar_data}, $hr_output);
close ($fh);
}
I'm not quite sure what you intended to do to those html-pages, and i'm not sure that what's happening is what you really want, so I'm just going to skip that :P
Next up is inserting all accumulated data:
if (@{$ar_data}) {
# If there is any data, prepare the query
my @fields = keys %{$ar_data->[0]}; # Extract all the keys from th
+e first element of ar_data
my $fieldlist = join(',', @fields);
my $field_placeholders = join(',', map {'?'} @fields);
my $sth_insert_article = $dbh->prepare( qq(
INSERT INTO article ($fieldlist) VALUES ($field_placeholders)
));
my $sth_select_last_insert_id = $dbh->prepare( qq( SELECT LAST_INS
+ERT_ID ));
my $sth_insert_event = $dbh->prepare( qq(
INSERT INTO event (event) VALUES (?)
));
my $sth_insert_article_event = $dbh->prepare( qq(
INSERT INTO article_event_index (id_article, id_event) VALUES
+(?,?)
));
# Now we loop over and insert all data
foreach my $hr_output (@{$ar_data}) {
# Insert article data
my $inserted_records = $sth_insert_article->execute(@{$hr_outp
+ut}{@fields}); # Ok, I admit, this one is really tricky to read :)
if ($inserted_records != 1) { die "Error inserting article [$h
+r_output->{url}], only [$inserted_records] got inserted: " . $sth_ins
+ert_article->errstr; }
# Get article ID
my $article_id = ($dbh->selectcol_arrayref($sth_select_last_in
+sert_id))->[0];
# Insert events
foreach my $event (@{$hr_output->{events}}) {
$inserted_records = $sth_insert_event->execute($event);
if ($inserted_records != 1) { die "Error inserting event [
+$event], only [$inserted_records] got inserted: " . $sth_insert_event
+->errstr; }
# Get event ID
my $event_id = ($dbh->selectcol_arrayref($sth_select_last_
+insert_id))->[0];
# Insert article_event combo
$inserted_records = $sth_insert_article_event->execute($ar
+ticle_id, $event_id);
if ($inserted_records != 1) { die "Error inserting article
+-event [$article_id, $event_id], only [$inserted_records] got inserte
+d: " . $sth_insert_article_event->errstr; }
}
}
}
This follows the loop I described in my previous post.
There are a few tricks here, which I'll leave for you to figure out (you can still ask though :)).
The most unreadable trick is
@{$hr_output}{@fields} which produces an array of the value-parts of the hashref for all @fields, in order of those @fields.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.