Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Batch loading data into Oracle - Perl to the rescue!

by hotyopa (Scribe)
on Jan 28, 2003 at 05:44 UTC ( [id://230512]=perlquestion: print w/replies, xml ) Need Help??

hotyopa has asked for the wisdom of the Perl Monks concerning the following question:

A job that I know is crying out for a perl solution, but its been a while and I can't quite get my head around it:

I'm using (or at least about to use) Oracle's 'sqlldr' utility to batch load some data into a database. I have about 128 files (some having up to 1,000,000 rows) with names like

'n1822.txt','n2327.txt'
which are to be loaded into tables with almost the same names -
'n1822','n2327'
(ie without the .txt)

Now, the sqlldr command (I'm on a Linux box) uses a control file to tell it how the data is formatted, which can be either a seperate text file, or embedded in the data file as a header (as below). This is my control file:

LOAD DATA
INFILE *  -- tells sqlldr the data is in this file
INTO TABLE {table name}
(
  person_id		POSITION(1:9)		CHAR(9)
  surname		POSITION(10:34)		CHAR(25)
  given_names		POSITION(35-59)		CHAR(25)
)
BEGINDATA
{data begins here}
and we call sqlldr from the command line as follows:

# sqlldr USERID=scott/tiger control='n1822.txt' log='log.txt' bad='bad.txt' direct=true

What I imagine I need is two scripts - one which prepends the control info onto the beginning every .txt file in my import directory, inserting the {table name} part, and another which calls sqlldr for each file in the directory.

I would love some suggestions about the key ingredients of these scripts. I'm particularly interested to know whether there is an efficient way to concatenate the small control file on to the top of the large data file.

Thanks!

*~-}hotyopa{-~*

Replies are listed 'Best First'.
Re: Batch loading data into Oracle - Perl to the rescue!
by t0mas (Priest) on Jan 28, 2003 at 06:48 UTC
    As you said. You can have the data in a separate file You don't need to concatenate anything at all.

    Create a separate control file 'n1822.ctl' like:

    LOAD DATA INFILE 'n1822.txt' INTO TABLE {table name} ( person_id POSITION(1:9) CHAR(9) surname POSITION(10:34) CHAR(25) given_names POSITION(35-59) CHAR(25) )

    Point your loader command to that file like:

    # sqlldr USERID=scott/tiger control='n1822.ctl' log='log.txt' bad='bad.txt' direct=true


    /brother t0mas

      Thanks tomas, that's what I've done. I still wanted to automate the creation of these control files though, so this is what I did. Wrote a generic control file called control.ctl:

      LOAD DATA INFILE <table>.txt INTO TABLE <table> ( person_id POSITION(1:9) CHAR(9) surname POSITION(10:34) CHAR(25) given_names POSITION(35-59) CHAR(25) )

      Then wrote the following script to write a control file for each data file in the directory:

      #!/usr/bin/perl #configuration variables $lpath="/u04/tmp/log"; $cpath="/u04/tmp/ctl/"; $cfile="control.ctl"; $dirname="/u04/tmp/txt/"; # load the control file data open(SOURCE, "< $cpath$cfile") or die "Couldn't open $cpath for reading: $!\n"; @control = <SOURCE>; foreach $line (@control) { $ctl .= $line; } #process all files in the directory opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { if ($file =~ /.txt/) { ($table = $file) =~ s/.txt//; #print "$file -> $table \n"; #write the control file open(COUT, "> $cpath$table.ctl") or die "Couldn't open $cpath$file.ctl for writing: $!"; ($tmp = $ctl) =~ s/<table>/$table/g; print COUT $tmp; close COUT; #call SQL*Loader $cmd = "sqlldr USERID=user/id\@service ". "control=$cpath$table.ctl " . "log=$lpath$table.log " . "bad=$lpath$table.bad direct=true"; $status = system($cmd); } } closedir(DIR);

      I'm sure this code could be cleaned up a bit, but it does the job.

      *~-}hotyopa{-~*

Re: Batch loading data into Oracle - Perl to the rescue!
by rdfield (Priest) on Jan 28, 2003 at 09:13 UTC
    The more interesting bit of the problem would be correctly extracting the relevent metadata - are you sure that the lengths of the columns in the data dictionary match the length of the corresponding field in the text file? Is the column order the same? How are the numbers represented in the file? You may have to have a hand coded translation/mapping configuration file/table in order to accurately handle the mapping.

    Are you using 9i - there's some nice ETL features (such as using a file as a table)?

    rdfield

      Thanks for the reply. I am using 9i, but the files will be used only for querying, and are rather large, so there would be a bit too much of a performance hit to just use them as external tables.

      To answer your first question, the data dictionary is exactly the same for each file. So what I am looking to do is to loop through every file in the directory, replace the table name in the control file, then invoke sqlldr.

      *~-}hotyopa{-~*

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://230512]
Approved by dorko
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-03-29 02:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found