Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

SQL vs Perl table comparing

by LiTinOveWeedle (Scribe)
on Nov 07, 2002 at 18:21 UTC ( #211167=perlquestion: print w/replies, xml ) Need Help??
LiTinOveWeedle has asked for the wisdom of the Perl Monks concerning the following question:

Hi my dear Monks.
I wrote part of provisioning interface which manage data from inventory management into performance managent. Data are stored into SQL, Oracle database. In predefined time my script compare SQL table with its SQL local backup copy to find out changes... So I am comparing two SQL tables with identical structure to find:

1. records existing only in first table
2. records existing only in second table
3. records with identical primary keys, but with different content.
4. record which are totaly identicals.

To do that I am using DBI::Oracle module and ->selectall_hashref() method to fetch whole content of both tables to two hash data structures. Than I have simple procedure to compare both hashes and as result I get three hashes (see points 1-3, I don't take care about identical record - point 4) This works good for now. I am using something like this:

foreach $key ( keys %{$hash1} ) { if ( exists( $hash2->{$key} ) ) { if ( not compare( $hash1->{$key}, $hash2->{$key} ) ) { $hash3->{$key} = $hash1->{$key}; } delete( $hash1->{$key} ); delete( $hash2->{$key} ); } }

where $hash1 and $hash2 are references to input tables and output are hash1(see point 1.), hash2(see point 2.) and hash3(see point 3.)

Problem is, that in this time tables contains about hundreds of records. Now we are going to store about 10.000 records in tables!!! Because I am reading whole table into perl variable (into the memory):

1. I want to know, if perl can handle so big data structure or is this just HW sizing problem?
2. Is there more efficient way to do this comparision in perl?
3. Is there more efficient way how to compare two tables with same results directly in SQL?

->selectall_hashref() method return hash of arrays, where each array is one row in SQL table and hash key is in my example sql primary key. Table has about 20 collumns.

Hope that the code I wrote bellow is good. I tried to memorized it, but my brain work worse than sql database :-). Thank you for help brothers.

Regards LTOW

Li Tin O've Weedle
mad Tsort's philosopher

Replies are listed 'Best First'.
Re: SQL vs Perl table comparing
by CubicSpline (Friar) on Nov 07, 2002 at 18:46 UTC
    I think you are right to question your approach here. Relational databases were made to do the kind of thing you're talking about here. Let's look at how you'd do these things.

    Let's assume you have Table and BackupTable, which have the exact same columns and a primary key named 'Key'.

    1. records existing only in first table

    SELECT * FROM Table WHERE NOT EXISTS (SELECT * FROM BackupTable WHERE Table.Key = BackupTable.Key)

    2. records existing only in second table

    Same thing as 1, but switch the table names.

    3. records with identical primary keys, but with different content.

    The best I can think of here is to get a list of the primary key values that are in both tables but have different values.

    SELECT Key FROM Table LEFT JOIN BackupTable ON Table.Key = BackupTable +.Key WHERE Table.column1 <> BackupTable.column1 OR Table.column2 <> B +ackupTable.column2 ......
    Hopefully, Oracle provides some sort of utility to do this, but if I had to do it through straight SQL those are the things I would try.

    Good Luck.

    "No one tosses a Dwarf!"

Re: SQL vs Perl table comparing
by FamousLongAgo (Friar) on Nov 07, 2002 at 18:42 UTC
    Perl can handle very big data structures - your constraint will be physical memory.

    However, what you are looking to do is what RDBMs are designed to do well. You should use SQL to get the results you want ( this is MySQL dialect, look up the proper syntax for joins on your own system): 1 SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.key = t2.key WHERE t2.key IS NULL
    2 SELECT t2.* FROM t1 LEFT JOIN t2 ON t2.key = t1.key WHERE t1.key IS NULL
    3SELECT t1.* FROM t1, t2 WHERE t1.key = t2.key AND NOT t1.content = t2.content
    4 SELECT t1.* FROM t1, t2 WHERE t1.key = t2.key AND t1.field1 = t2.field1 AND...
    Of all these, only #3 may be tricky, if you don't have a specific field whose value you can check. But even there I suspect there is a clean SQL solution that will work faster than Perl. If in doubt, benchmark and see how it goes.
Re: SQL vs Perl table comparing
by grantm (Parson) on Nov 07, 2002 at 23:31 UTC

    For a really quick solution, you could dump each table to a CSV file (ordered by primary key) and then run diff against them.

Re: SQL vs Perl table comparing
by rdfield (Priest) on Nov 08, 2002 at 10:23 UTC
    Have a look at using the 'intersect' and 'minus' functions within Oracle SQL. Very easy to use and produce all the results you require without messing about with all of the horribly complicated and easy to get wrong content comparison SQL that's been proposed. Which, btw, will all fail where there are null values (null != null).


Re: SQL vs Perl table comparing
by Anonymous Monk on Jul 11, 2013 at 10:52 UTC

    Hi , Im working in ETL testing were i used to extract the data from Oracle 11g both Source and Target by running queries and put them result extracts into excel sheet and validate the same using excel macro . But i heard that , this can be done using python , perl , or Unix scripting . im looking forward suggestion / answers for my requirement . Any example script is welcome

      You need to start learning one or more of those things you mentioned, try some things, and come back with a specific question (perl questions if you decided to go with perl). Also, it's not good form to cross-post without mentioning that you have cross-posted the question between forums.

      E.g., start learning perl, write something simple, come back with specific questions, learn how to use the DBI library to access data in databases, write something simple, and come back with more specific questions.

      Something like this should do:

      use DBI; use Spreadsheet::WriteExcel; my $DB=DBI::connect( ... connection args for oracle DB ... ); my $ar = $DB->selectall_arrayref('select * from table'); my $XLWB = Spreadsheet::WriteExcel->new('table.xls'); my $XLWS = $XLWB->add_worksheet('table'); my $row = 0; $XLWS->write_row($row,0,$_) for @$ar;


      When your only tool is a hammer, all problems look like your thumb.

        Spreadsheet::WriteExcel is superceeded by Excel::Writer::XLSX.

        But I think the poster wants to avoid putting all the data in a spreadsheet where they currently run macros to compare data 8-) ... and wants you to write a generic table compare utility

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://211167]
Approved by joe++
and the monks are mute...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2018-05-26 22:43 GMT
Find Nodes?
    Voting Booth?