Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

DBI/load data infile issue

by jimbus (Friar)
on Jul 08, 2005 at 15:54 UTC ( #473462=perlquestion: print w/ replies, xml ) Need Help??
jimbus has asked for the wisdom of the Perl Monks concerning the following question:

This is driving me insane...

I have a perl script that digests some info and loads it into mysql and it was running happily, albiet a bit slowly, inserting line by line using DBI.

While researching mysql optimazation I found out that I should be writing the data to a file and bulk loading it with "load data infile...".

When I got that running, I found that I getting 3557 records as opposed to the 3599 I was expecting and the last record was monged (all defaults or nulls). I check the tmp file and it had 3599 lines, I printed out the command from the perl script and appened it to mysql -e and got 3599 records.

I thought, maybe, it was the pipe not clearing before the script moved on, so I tried to find out how flush things, but couldn't find anything. I even tried running a commit...

I slept on it and this morning decided that since it works from the command line, I'd just rip out the DBI stuff and run the command line version with back ticks... 3557 records!

ARGHHHHHHHHHHHHHHHHH!

I saw something simular in "Unable to load entire CSV file into DB", but didn't see a resolution or any ideas I though applied to me...

Can anyone help me, before I go insane?

Thanks,

Jimbus

Never moon a werewolf!

Comment on DBI/load data infile issue
Re: DBI/load data infile issue
by bofh_of_oz (Hermit) on Jul 08, 2005 at 16:04 UTC
    A code example, anyone?

    Seriously, it is nice to discuss ideas, but it helps to see a piece of code that produces error to actually debug that error... Even though the error might not be induced by the code

    --------------------------------
    An idea is not responsible for the people who believe in it...

      Code w/o DBI

      use DBI; use Logs; use Net::FTP; use Time::Local; use Time::Timezone; use POSIX; use File::Temp qw/ tempfile /; open (IN, "/home/reports/ftp/WSB/test.txt"); #open (IN, "gzip -dc /tmp/arc/0620.gz |"); ($cntfh, $cntfn) = tempfile('WSB_count_XXXX', DIR => '/tmp'); ($urlfh, $urlfn) = tempfile('WSB_urls_XXXX', DIR => '/tmp'); chmod 0666, $urlfn; chmod 0666, $cntfn; my $url_str = "mysql -u root -prep0rts reports -e\"load data infile '" + . $urlfn . "' into table tmpurl fields terminated by '|' (tst amp,url,count)\""; my $cnt_str = "mysql -u root -prep0rts reports -e\"load data infile '" + . $cntfn . "' into table tmpwsb fields terminated by '|' (tst amp,op100,op700,total)\""; print "$url_str\n"; print "$cnt_str\n"; @lines = <IN>; $lines = @lines; print "lines: $lines\n"; # parse records and increment counters foreach $line (@lines) { ($time, $opcode, $url) = parse($line); if ($opcode == "100") { $tstamp{$time}++; $op100{$time}++; if (length $url > 0) { my $tmpkey = $url . "<::>" . $time; $urls{$tmpkey}++; } } elsif ($opcode == "700") { $tstamp{$time}++; $op700{$time}++; if (length $url > 0) { my $tmpkey = $url . "::" . $time; $urls{$tmpkey}++; } } } @time = sort keys %tstamp; print "Done parsing: @time[0]: @time[$#time]\n"; for ($ii = @time[0]; $ii <= @time[$#time]; $ii++) { $tmpTime = strftime "%Y-%m-%d %H:%M:%S", localtime($ii); $tmp100 = $op100{$ii} || 0; $tmp700 = $op700{$ii} || 0; $tmpTotal = $tstamp{$ii} || 0; print $cntfh "$tmpTime|$tmp100|$tmp700|$tmpTotal\n"; } print "Done writing counts\n"; # enter URLS into database foreach $key (sort keys %urls) { my ($url,$tstamp) = split (/<::>/, $key); $tmpTime = strftime "%Y-%m-%d %H:%M:%S", localtime($tstamp); print $urlfh "$tmpTime|$url|$urls{$key}\n"; } print "Done writing urls\n"; `$cnt_str`; print "Done writing count DB\n"; `$url_str`; print "Done writing url DB\n";

      Code with DBI:

      use Logs; use Net::FTP; use Time::Local; use Time::Timezone; use POSIX; use File::Temp qw/ tempfile /; open (IN, "/home/reports/ftp/WSB/test.txt"); #open (IN, "gzip -dc /tmp/arc/0620.gz |"); ($cntfh, $cntfn) = tempfile('WSB_count_XXXX', DIR => '/tmp'); ($urlfh, $urlfn) = tempfile('WSB_urls_XXXX', DIR => '/tmp'); chmod 0666, $urlfn; chmod 0666, $cntfn; my $dbh = DBI->connect('dbi:mysql:reports', 'root','rep0rts', {RaiseEr +ror => 1,PrintError => 0}) or Die ("load data","WSB","Could n t connect to database" . $dbh->errstr); my $url_str = "load data infile '" . $urlfn . "' into table tmpurl fie +lds terminated by '|' (tstamp,url,count)"; my $cnt_str = "load data infile '" . $cntfn . "' into table tmpwsb fie +lds terminated by '|' (tstamp,op100,op700,total)"; print "$url_str\n"; print "$cnt_str\n"; @lines = <IN>; $lines = @lines; print "lines: $lines\n"; # parse records and increment counters foreach $line (@lines) { ($time, $opcode, $url) = parse($line); if ($opcode == "100") { $tstamp{$time}++; $op100{$time}++; if (length $url > 0) { my $tmpkey = $url . "<::>" . $time; $urls{$tmpkey}++; } } elsif ($opcode == "700") { $tstamp{$time}++; $op700{$time}++; if (length $url > 0) { my $tmpkey = $url . "::" . $time; $urls{$tmpkey}++; } } } @time = sort keys %tstamp; print "Done parsing: @time[0]: @time[$#time]\n"; for ($ii = @time[0]; $ii <= @time[$#time]; $ii++) { $tmpTime = strftime "%Y-%m-%d %H:%M:%S", localtime($ii); $tmp100 = $op100{$ii} || 0; $tmp700 = $op700{$ii} || 0; $tmpTotal = $tstamp{$ii} || 0; print $cntfh "$tmpTime|$tmp100|$tmp700|$tmpTotal\n"; } print "Done writing counts\n"; # enter URLS into database foreach $key (sort keys %urls) { my ($url,$tstamp) = split (/<::>/, $key); $tmpTime = strftime "%Y-%m-%d %H:%M:%S", localtime($tstamp); print $urlfh "$tmpTime|$url|$urls{$key}\n"; } print "Done writing urls\n"; $dbh->do($cnt_str); print "Done writing count DB\n"; $dbh->do($url_str); print "Done writing url DB\n"; $dbh->disconnect(); print "just done\n";
      Never moon a werewolf!
        Make sure to close your tmp file before starting the loader. This will flush all data to the file.

        Raafschild

        Disclaimer: Whatever errors I find may be unrelated to the problem itself as I repeatedly got distracted while writing this comment...

        - (A trait of senior monks that's rubbed off on me): use strict - it helps...

        - Shouldn't

        for ($ii = @time[0]; $ii <= @time[$#time]; $ii++)
        be using $time[0] and $time[$#time]? There are also other lines using that notation...

        - In your "Code with DBI" make PrintError=>1 and redirect the output for late night reading ;)

        - Check those output files to make sure that there are no "character combinations" that have a special meaning in SQL... It could be where you're losing the data...

        --------------------------------
        An idea is not responsible for the people who believe in it...

Re: DBI/load data infile issue
by sk (Curate) on Jul 08, 2005 at 16:05 UTC
    I am not sure how much of this is completely Perl related. Make sure you are checking for errors on the DBI handle.. If possible you can post you code and we can try it out..

    Just for debugging - Upload the file directly using mysql..go into mysql prompt...Use the load data command. Check the number of records that got in..it might be the file and not the script. Also if i remember correctly you can do  show warnings; and get the list of warnings generated by mysql..

    -SK

      Not to be snippy, but my original post does say I've done exactly what you suggested and got the expected results:

      I check the tmp file and it had 3599 lines, I printed out the command from the perl script and appened it to mysql -e and got 3599 records.

      The results from the code listed above with PrintError=1 and trace=2:

      [reports@clarkkent]/home/reports/ftp/WSB(29): ./b.pl load data infile '/tmp/WSB_urls_CDyy' into table tmpurl fields termina +ted by '|' (tstamp,url,count) load data infile '/tmp/WSB_count_YAkf' into table tmpwsb fields termin +ated by '|' (tstamp,op100,op700,total) lines: 72182 Done parsing: 1107730800: 1107734398 Done writing counts Done writing urls Done writing count DB Done writing url DB just done [reports@clarkkent]/home/reports/ftp/WSB(30):

      I can cut and paste the load command printed out by the script to "mysql -u root -pXXXX -e" and run it from the unix command line and I get the correct results.

      When I run the same command from the perl script in ``s I get the same results as I do using DBI

      Never moon a werewolf!
Re: DBI/load data infile issue
by BaldPenguin (Friar) on Jul 08, 2005 at 16:46 UTC
    Have you tried parsing the file without loading it all into memory?
    open (IN, "/home/reports/ftp/WSB/test.txt"); .... while ( <IN> ) { my $line = $_; .... }
    I realize you won't get the total counts before parsing, but you could see if the load into memory had something to do with it.

    Don
    WHITEPAGES.COM | INC

    Edit by castaway: Closed small tag in signature

Re: DBI/load data infile issue
by runrig (Abbot) on Jul 08, 2005 at 16:58 UTC
    Have you determined which records don't get loaded? Are they always the last ones? Perhaps the load does some background processing, and returns to the parent script before it finishes, and the backgrounded process dies. Try adding a 'sleep N' command after the load command (or nohup'ing it)?

    Update: or I think raafschild may have it above. (I didn't think of it since I didn't realize that the load file itself was being created in the perl program).

Re: DBI/load data infile issue
by sk (Curate) on Jul 08, 2005 at 17:14 UTC
    Sorry I did not read carefully about your trying to load the file manually into mysql

    . If it is not too hard to change the code, can you try to open a proper file handle (just by using  open (OUT, ">myfile"); and then write to this handle instead of the one generated by File::Temp? Also before you call dbh-do() can you  close(OUT) or die $!;? See if this makes any difference.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://473462]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2014-07-30 23:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (241 votes), past polls