Import data into MS-Access

This is a script that will import data into a Access-Database, using Active States "PerlScript"-Extension, along with a part in VBScript. This is very fast, at least faster than using win32::odbc or even win32::ole working directly with access-objects).

it is also a nice example for mixing script-languages for the MS-Scripting Host.

Installation:

1) Copy the source to a file file (eg. txt2mdb.wsf) and add this files` directory to %PATH (the windows system path, not the hash %PATH!)

2) Create 2 subdirs in that directory named XDS and MDB.

XDS is the dir where the recordSpecs go into

MDB will hold empty databases

Invocation:

txt2mdb.wsf <recordSpec> <[new[:template]:]Database> <Table> file1 file2 file3

#Simple Invocation, the database exists:
txt2mdb.wsf XXSUM.XDS c:\temp\xxum.mdb "EK"  c:\satnt\xxsum.e

#simply the same:
txt2mdb.wsf XXSUM c:\temp\xxum.mdb EK c:\satnt\xxsum.e

#Create the Database from an empty one <recordSpec>.mdb:
txt2mdb.wsf XXSUM new:c:\temp\xxum.mdb EK c:\satnt\xxsum.e

#if the name of the empty database differs from the recordSpec:
txt2mdb.wsf OLAP_1 neu:OLAP:c:\temp\xxum.mdb O1 c:\satnt\olap.txt
[download]

Searching

The script will search the subdirs "mdb" and "xds" (for every member of %PATH) for appropriate files. The first will be used.

record Specifications

The script supports three kinds of specs

recordSpecs for CSV-files

recordSpecs for line-based files with fixed length

special recordSpecs using perl code

Example for CSV-files

 Input-File:
 -------------
 730112200;12345;XXX;F999087
 
 recordSpec:
 -----------------
 <?xml version="1.0" encoding="ISO-8859-1"?>
 <record kind="csv" separator=";">
     <ignore regex="(^$|^[ \t]+$|^[ \t]*#)/" />
     <field name="ANR"  > $_ =~ s/00$//; </Field>
     <field name="VKNR"   />
     <filler              >
     <field name="fbpos"  />
</datensatz>
[download]

Fields will be separated at ";"

Lines that match the regex in the ignore-tags will be ignored.

Fields must be listed in the same order they have in the table.

If there is perl-code in the field-tag, this code will be evaluated for every record.

The actual value of the field is $_, it is possible to change that value.

Example for fixed length files

 Inputfile:
 -------------
 #Arztnr;VKNR;fbpos
 730112200---01234----F999087
 
 recordSpec:
 -----------------
 
 <?xml version="1.0" encoding="ISO-8859-1"?>
 <record kind="csv" separator=";">
     <ignore regex="(^$|^[ \t]+$|^[ \t]*#)/" />
     <field name="Arztnr" pack="A7" />
     <filler              pack="X5" />
     <field name="VKNR"   pack="A5" />
     <filler              pack="X4" />
     <field name="fbpos"  pack="A7" />
</datensatz>
[download]

Lines that match the regex in the ignore-tags will be ignored.

Fields must be listed in the same order they have in the table.

The length of the field will be determined by the pack-attribute.

If there is perl-code in the field-tag, this code will be evaluated for every record.

The actual value of the field is $_, it is possible to change that value.

Pack-Atribute:

A - normal Ascii-char

X - Ignore (for Filler)

x - Step back

Special Example

 Inputfile:
 -------------
 ...
 ...
 Arzt: 73011220
 ...

 ...

 K 012345\tF999087
 ...
 
 
 recordSpec:
 -----------------
 
<?xml version="1.0" encoding="ISO-8859-1"?>
<record kind="special">
    <special>
    <![CDATA[

        # remember Arztnr
        if ( /^Arzt: ([0-9]{7})/ )
        {
      $satz{"ArztNr"}=$1;
    }

        # record is fully read
        if ( /^K ([0-9]{5})\t([A-Z][0-9]{6})/ )
        {
            # remember VKNR and fbpos 
            $satz{"VKNR"}  = $1;
            $satz{"fbpos"} = $2;

            # write record
            &update();
            
            # Important!: reset data for next record
            %record=();
        }
    ]]>
    </special>
</record>
[download]

The code in the special-tag will be packed into a Function, that is called for every line of the inputfile.

The Hash %record is defined outside the function, so it is there to store values for the record.

calling &update() writes the values to the databes (keys are the field-names of the database)

Caveats:

Currently the CSV-lines are simply splitted at the separator. What would you prefer: Text::CSV, regex, ...?

No support for "real" fixed length, records (will come if there is somebody interested. i didn´t need it so i didn´t implement it.)

Finally, here is code:

<job id="PERLandVBS">

    <script language="VBScript">

        dim Fields()
        dim Constants()
        dim Constants_init
        dim Constants_values()
        dim jet
        dim db
        dim rs
        
        function openDatabase (Database, Table)
                on error resume next

                set jet = CreateObject ("DAO.DBEngine.36")
                if Err.Description <> "" then openDatabase = Err.Descr
+iption: exit function
                
                
                set db  = jet.OpenDatabase(Database)
                if Err.Description <> "" then openDatabase = Err.Descr
+iption: exit function

                set rs  = db.OpenRecordset(Table)
                if Err.Description <> "" then openDatabase = Err.Descr
+iption: exit function
                
                openDatabase = 1
        end function
        
        function closeDatabase ()
                on error resume next
                
                rs.Close
                if Err.Description <> "" then closeDatabase = Err.Desc
+ription: exit function
                
                db.Close
                if Err.Description <> "" then closeDatabase = Err.Desc
+ription: exit function
                
                closeDatabase = 1
        end function


        sub rememberFields (Names)
            
            redim preserve Fields(Ubound(Names))
            
            for i=0 to Ubound(Names)
                Fields(i) = Names(i)
            next
        
        end sub        

        sub rememberConstant (name, wert)

            'WScript.Echo ("rememberConstant " & name & " -> " & wert)
            '
            
            if ( Constants_init ) then
                redim preserve Constants(Ubound(Constants))
                redim preserve Constants_values(Ubound(Constants))
            else
                redim Constants (0)
                redim Constants_values(0)
                Constants_init = 1
            end if
            
            Constants(Ubound(Constants)) = name
            Constants_values(Ubound(Constants)) = wert
        
        end sub        
    
        
        function importFields (values)
            on error resume next
            dim i
            
            rs.AddNew
            if Err.Description <> "" then importFields = Err.Descripti
+on: exit function
            
            dim Fieldname
            
            for i=0 to UBound(values)
                
                Fieldname = Fields(i)
                
                if values(i) <> "" then
                    rs.fields(Fieldname).value = values(i)
                    if Err.Description <> "" then importFields = Err.D
+escription & " (" & Fields(i) & ": " & values(i) & ")" : exit functio
+n
                end if
            
            next
            
            if Constants_init then
                for i=0 to UBound(Constants)
                    if Constants(i) <> "" then
                        rs.fields(Constants(i)).value = Constants_valu
+es(i)
                        if Err.Description <> "" then importFields = E
+rr.Description & " (" & Fields(i) & ": " & values(i) & ")" : exit fun
+ction
                    end if

                next
            end if
            
            rs.Update
            if Err.Description <> "" then importFields = Err.Descripti
+on: exit function
            
            importFields = 1
            
        end function
        
    </script>
   
    <script language="PerlScript">
        use strict;

        use XML::Parser;
        use Unicode::String qw ( utf8 );
        use Data::Dumper;
        
        DESTROY {}
        
        our $VERSION = '0.3';
        
        wshprint ("TXT2DB Version $VERSION\n");

        use Win32::API;
        use File::Copy;
        
        Win32::API->Import( "winmm.dll", "LONG timeGetTime()" );
        
        my @PATH;
        for ( keys %ENV )
        {
            if ( /^path$/i )
            {
                @PATH = split ";", $ENV{$_};
                last;
            }
        }
        
        wshdie 
        (
            "Too less arguments:\n" .
            "invocation: cscript.exe txt2db.wsf <recordSpec> <[new[:te
+mplate]:]Database> <Table> <file1..n>\n"
        ) if $main::WScript->Arguments->Count < 3;
        
        my $recordSpec = $main::WScript->Arguments(0);
        wshdie ("No recordSpec specified!\n") unless defined $recordSp
+ec;
        my $frecordSpec = findxds ( $recordSpec );
        
        my $Database        = $main::WScript->Arguments(1);
        wshdie ("No Database specified!\n") unless defined $Database;
        $Database = finddb ( $Database );
        
        my $Table          = $main::WScript->Arguments(2);
        wshdie ("No Table specified!\n") unless defined $Table;
        
        my @files;
        my $packstr;

        for (my $i=3; $i<$main::WScript->Arguments->Count;$i++)
        {
             push @files, $main::WScript->Arguments($i);
        }
        
        wshdie ("No files specified for import !\n") unless @files;
        
        my $kind;
        my $separator;
        my @Fields;
        my @Constants;
        my @ignore;
        my $ret;
        my $lines_file  = 0;
        my $records_file  = 0;
        my $saetze_gesamt = 0;
        my $specialcode = "";
        my %satz;
        my $firstupdate = 1; 
        
        my $startzeit;
        my $laufzeit;
        my $timeLeft;
        my $pzeit;
        
        my $bytes_gesamt = 0;
        my $bytes_gelesen = 0;

        &event ("read recordSpec");
        read_recordSpec();
    

        for my $k ( @Constants ) 
        {
            my @dup = grep { (not defined $_->{'filler'}) && ($_->{'na
+me'} eq $k->{'name'}) } @Fields;
            
            if ( scalar @dup > 0 )
            {
                wshdie ("Constant '$k->{'name'}' already declared as F
+ield!\n");
            }
            
            &rememberConstant ($k->{name}, $k->{wert});
        }        
        
        if ( $specialcode )
        {
            $specialcode .= "\n}";
            eval $specialcode;
        }
        
        wshdie ("Error compiling <special>-code: $@") if $@;
        
        &event ("open Database <$Database>, Table: <$Table>");
        $ret = openDatabase($Database, $Table);
        wshdie ("Error opening Database: $ret\n") unless $ret == 1;
        
        rememberFields ( [ map { $_->{name} } grep { not defined $_->{
+'filler'} } @Fields ] ); 
                
        &event ("starting Import...");
        $startzeit = timeGetTime() / 1000;
        
        import();

        &event ("close Database");        
        $ret = closeDatabase();
        wshdie ("Error closing the Database: $ret\n") unless $ret == 1
+;
        
        &event ("end Import");        

        sub read_recordSpec
        {
            my $parser = new XML::Parser (Style=>'Subs',Handlers => {C
+har=>\&code});

            $parser->parsefile ($frecordSpec);

            if ( $kind eq "fix" )
            {
                wshdie ("Error reading recordSpec: specified \"separat
+or\" along with kind=\"fix\"!\n")
                    if $separator;
                wshdie ("Error reading recordSpec: No fields specified
+!\n") unless @Fields;
            }
            elsif ( $kind eq "csv" )
            {
                wshdie ("Error reading recordSpec: No \"separator\" sp
+ecified!\n")
                    unless $separator;
                wshdie ("Error reading recordSpec: no Fields specified
+!\n") unless @Fields;
            }
            elsif ( $kind eq "special" )
            {
            }            
            else
            {
                wshdie ("Error reading recordSpec: Unknown value for f
+ür \"kind\"!\n");
                
            }
        }

        sub field
        {
            my $expat = shift;
            my $Field = utf8(shift)->latin1;
            my %attribute = map { utf8($_)->latin1 } @_;

            wshdie ("Error reading recordSpec: No fieldname specified!
+\n") unless defined $attribute{'name'};

            if ( $kind eq "fix" )
            {
                wshdie ("Error reading recordSpec: No Attribute \"pack
+\" specified ($attribute{'name'})!\n") unless defined $attribute{'pac
+k'};
            }

            $packstr.= $attribute{'pack'};
            
            push @Fields, \%attribute;
        }
        
        sub filler
        {
            my $expat = shift;
            my $Field = utf8(shift)->latin1;
            my %attribute = map { utf8($_)->latin1 } @_;

            if ( $kind eq "fix" )
            {
                wshdie ("Error reading recordSpec: No Attribute \"pack
+\" specified ($attribute{'name'})!\n") unless defined $attribute{'pac
+k'};
                $packstr.= $attribute{'pack'};
            }
            
            if ( $kind eq "csv" )
            {
                push @Fields, { filler => 1 };
            }

        }

        sub record
        {
            my $expat = shift;
            my $Field = utf8(shift)->latin1;
            my %attribute = map { utf8($_)->latin1 } @_;

            $kind     = lc($attribute{kind})         if defined $attri
+bute{kind};
            $separator = qr/$attribute{separator}/ if defined $attribu
+te{separator};
        }

        sub ignore
        {
            my $expat = shift;
            my $Field = utf8(shift)->latin1;
            my %attribute = map { utf8($_)->latin1 } @_;

            my $re = qr/$attribute{regex}/;
            push @ignore, $re;
        }

        sub code
        {
            my $expat = shift;
            my $text = utf8(shift)->latin1;
            
            return if $text =~ /^$/;
            return if $text =~ /^[ \t\n]+$/ms;
            
            my @context = $expat->context;
            
            if ( $context[-1] =~ /^special$/i )
            {
                &specialc ($text);
            }
            else
            {
                $Fields[-1]->{code} .= $text;
            }
        }
        
        
        sub specialc
        {
            my $text = shift;
            
            $specialcode = "\$specialcode = sub { \$_ = \$_[0]; " unle
+ss $specialcode;
            $specialcode .= "$text";
        }
        
        sub constant
        {
            my $expat = shift;
            shift;
            my %attribute = map { utf8($_)->latin1 } @_;
            
            wshdie ("Error reading recordSpec: No Attribute \"name\" s
+pecified!\n") unless defined $attribute{'name'};
            wshdie ("Error reading recordSpec: No Attribute \"value\" 
+specified ($attribute{'name'})!\n") unless defined $attribute{'wert'}
+;
            
            my $ts = $main::WScript->StdIn;
            
            if ( $attribute{'wert'} =~ /!([^!]+)!$/ )
            {
                my $a = "";
                while ( $a eq "" )
                {
                    wshprint ("Enter constant value ($attribute{'name'
+}):");
                    $a = $ts->ReadLine();
                    chomp $a;
                    
                    if ( $attribute{"match"} )
                    {
                        $a = "" unless $a =~ m/$attribute{"match"}/;
                    }
                }
                
                $attribute{'wert'} = $a;
            }
            
            push @Constants, \%attribute;
        }

        sub import
        {
            $saetze_gesamt = 0;
            
            for my $Field ( @Fields )
            {
                if  ( $Field->{'code'} )
                {
                    my $code = "\$Field->{'code'}=sub { \$_=\$_[0]; $F
+ield->{code}; return \$_; }";
                    eval $code;
                    wshdie ("Error reading recordSpec: Error in Userco
+de: $@\n") if $@;
                }
            }
            
            for ( @files )
            {
                for ( glob ($_) )
                {
                    $bytes_gesamt += (stat $_)[7];
                }
            }            

            for ( @files )
            {
                $records_file = 0;
                $lines_file = 0;
                
                for ( glob ($_) )
                {
                    import_file ($_);
                }
            }
        }

        my $i;
        
        sub import_file
        {
            my $file = shift;
            $pzeit = timeGetTime()/1000;
            
            %satz = ();
            
            &event ("open file <$file>");
            
            open IN, $file or wshdie ("Import-Error: Cannot open <$fil
+e>!\n");

            while ( my $line = <IN> )
            {
                $lines_file++;
                

                $bytes_gelesen += length($line);
                
                chomp ($line);
                
                my $no = 0;
                
                for ( @ignore )
                {
                    $no = 1, next if $line =~ $_;
                }

            
                next if $no;
                
                
                
                my @f;

                if ( $kind eq "fest" )
                {
                    $records_file++;
                    $saetze_gesamt++;
                    
                    eval
                    {
                        @f = unpack ($packstr, $line);
                    };
                    
                    wshdie ("Error using unpack: $@") if $@;
                }
                elsif ( $kind eq "special" )
                {
                    chomp $line;
                    my $s = &$specialcode ($line) || 0;
                    $records_file+=$s;
                    $saetze_gesamt+=$s;
                }
                else
                {
                    $records_file++;
                    $saetze_gesamt++;
                
                    @f = split ($separator, $line);
                }

                if ( timeGetTime() / 1000 > int($pzeit) + 5 )
                {
                        
                    $laufzeit = (timeGetTime() / 1000) - $startzeit;
                    $timeLeft = (($bytes_gesamt/$bytes_gelesen) * $lau
+fzeit) - $laufzeit;
                    &event ( sprintf ("%.2f", ($bytes_gelesen/$bytes_g
+esamt) * 100), "% imported, (", ft (int($timeLeft)), " minutes left).
+");
                    $pzeit = timeGetTime()/1000;
                }
                
                next if $kind eq "special";

                my $j = 0;

                for my $Field ( @Fields )
                {
                    my $sub = $Field->{'code'};
                    $f[$j] = &$sub ($f[$j]) if $sub;
                    
                    if ( defined $Field->{filler} )
                    {
                        $f[$j] = undef;
                    }
                    
                    $j++;
                    
                }
                
                my $r = importFields ([grep { defined $_ } @f]);
                wshdie ("Error writing record: $r in line $lines_file\
+n") unless $r == 1;

            }
            
            &event ("close file <$file>, $records_file records importe
+d.");
            
            close IN;
        }
        
        
        
        sub update
        {
            my @f;
        
            if ( $firstupdate )
            {
                $firstupdate = 0;
                my $f = [ sort keys %satz ];
                
                rememberFields ($f);
            }
            
            
            for ( sort keys %satz )
            {
                push @f, $satz{$_};
            }
            
            my $r = importFields (\@f);
            wshdie ("Error writing record: $r in line $lines_file\n") 
+unless $r == 1;
            
        }
        
        sub status
        {
            my $laufzeit;
            my $percent;
            

            return (timeLeft => $timeLeft, percent => $percent);
        }
    
        sub findxds
        {
            my $recordSpec = shift;
            $recordSpec =~ s/\.xds$//i;

            my $e = "";

            for ( "./", @PATH)
            {
                $e = "$_/xds/$recordSpec.xds" if -f "$_/xds/$recordSpe
+c.xds" && -e "$_/xds/$recordSpec.xds";
            }

            wshdie ("recordSpec '$recordSpec' not found!\n") unless $e
+ ;

            return $e;

        }
        
        sub finddb
        {
            my $Database = shift;
            my $sb;
            my $e;
            
            if ( $Database =~ /^new:([^:]+:)?(.+)$/ )
            {
                if ( $1 )
                {
                    
                    $sb = $1;
                    $Database = $2;
                    $sb =~ s/:$//;
                }
                else
                {
                    
                    $sb = $recordSpec;
                    $Database = $2;
                }
                
                for ( "./", @PATH)
                {
                    if ( -f "$_/mdb/$sb.mdb" && -e "$_/mdb/$sb.mdb" )
                    {
                        if ( -e "$Database" )
                        {
                            wshdie ("Database \"$Database\" existiert 
+bereits!\n");
                        }
                        else
                        {
                            copy "$_/mdb/$sb.mdb", "$Database";
                            last;
                        }
                    }
                }
            }

            wshdie ("Database '$Database' not found!\n") unless -e $Da
+tabase;

            return $Database;

        }        

        sub wshdie
        {
            my $message = join "", @_;
            
            unless ( $message =~ /\n$/ms )
            {
                my @caller = caller;
                $message .= " in Module $caller[0] line $caller[2]!";
            }
            
            wshprint ($message);
            $main::WScript->Quit (-1);
            exit;
        }
        
        sub wshprint
        {
            $main::WScript->Echo ( join ("", @_) );
        }
        
        sub ft
        {
            my $t = $_[0];
            my $m = int($t / 60);
            my $s = $t-($m*60);
            return sprintf("%2d", $m) . ":" . sprintf("%2d", $s);
        }
        
        sub event
        {
            my @t = localtime;
            wshprint (sprintf("%02d", $t[2]) .  ":" . sprintf("%02d", 
+$t[1]) . ":" . sprintf("%02d", $t[0]) . " ". join ("", @_) );
        }
[download]

Comment on Import data into MS-Access Select or Download Code


There's more than one way to do things
	PerlMonks