Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Testing database updates.

by chrestomanci (Priest)
on Apr 18, 2012 at 08:49 UTC ( #965673=perlquestion: print w/ replies, xml ) Need Help??
chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

Greetings wise brothers, I seek your wisdom in the matter preserving order and correctness, and preventing mistakes.

I am working on re-factoring and tidying a log parsing script. The script reads a custom log file, extracts relevant information, and then writes summary information into a database and a number of text and XML report files.

I need to make sure that the new version makes the same overall database changes as the old. I don't care about the order of inserts, or about any reads, just that the overall state at the end is what is should be.

The idea I had for my test script was to dump the database to text after running the test script, and to compare it with expected output, or better to somehow ingest the whole database into a big perl data structure and then use is_deeply() (from Test::More, to compare the actual with expected database state.

Is there a more efficient way of doing this? Is there already a module on CPAN for the purpose? or one that stores everything in a big perl hash? I did a search for "perl test database" and found modules such as Test::MockDBI and DBD::Mock which appear to be there to create error conditions for database testing rather than to record what DB changes are made.

Some background: the current log script is a horrible mass of spaghetti with a lot of technical debt, and every time features are added (frequently) bugs are introduced and time is wasted. Naturally there is no specification for what the script should be doing other than the source code, and the occasional comment. I am looking to re-factor the code by breaking it up into a number of smaller, well specified modules, each with tests, but I need to make sure that the new version produces the same reports as the old.

Comment on Testing database updates.
Re: Testing database updates.
by BrowserUk (Pope) on Apr 18, 2012 at 09:18 UTC

    Most DBs have a dump utility that will produce a plain text file as output. These can then be diff'd in the usual ways.

    Start with a known state -- eg. empty -- and use the existing script to perform a range of inserts updates and deletes. Dump the result and call it the reference.

    Start with the same known state, use the refactored script to perform the same operations and dump the result. Diff the two dumps.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I concur, with one proviso: you probably want to make sure that your dump script dumps records in a predictable order. Most of the dump scripts bundled with databases do not guarantee any particular order of results. You may need to write your own simple dump script.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        Though these dump utilities don't offer any particular ordering, in general, given the same versions & switches etc. they will produce the same ordering for the same source material. Ie. They don't have a random component.

        Whilst it is possible that two DBs that were arrived at via different routes -- ie. a different sequencing of inserts, updates and deletes -- may produce subtly different dumps due to that ordering, that the OP might decide to consider as "identical", it will require a decision to consider them so.

        That is, whilst diff'ing the dumps may produce "false negatives", it will require a human's decision to make the determination that they are indeed false.

        There is also the possibility to post process the dumps to achieve some 'canonical' format -- eg. trim whitespace; sort etc. -- but that should rarely be necessary.

        For the purposes of checking refactoring, it is safe to assume that the OP would be targeting the same versions of the DBMS & utility for both the reference and check dumps; and any variations between them will be of interest, even if ultimately dismissed as irrelevant.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      Thanks for your advice.

      The system I decided on was to modify the script I am re-factoring to take the path of an SQLite database on the command line. I then wrote a wrapper .t test script that creates an empty DB with just the schema before each test run. After each test run the DB is dumped to text and compared with the expected output.

      My personal take is that I'd much rather start out looking at everything and explicitly choose what I can disregard -- probably through post processing the dumps -- than to prejudge the issues by preselecting what to compare.

      I am post processing the dumps as well. I discovered almost immediately that one table contains a timestamp collum that defaults to the current time, so I had to replace that time stamp with a fixed string. I am also post processing the text output to suppress differences in newline chars and trailing whitespace.

      I dare say that I will soon have to pass the XML output through a tidy program to remove non syntactic differences in spacing & newlines, or just load both the actual and expected output with XMLin (from XML::Simple), and pass the passed data to is_deeply()

        I dare say that I will soon have to pass the XML output

        XML? Did you choose that format? Looking at the help from my copy of sqllite3 XML isn't even an option:

        .mode MODE ?TABLE? Set output mode where MODE is one of: csv Comma-separated values column Left-aligned columns. (See .width) html HTML <table> code insert SQL insert statements for TABLE line One value per line list Values delimited by .separator strin +g tabs Tab-separated values tcl TCL list elements

        Unless you chose HTML? Or did they add an XML option recently?

        Choosing XML over the ,dump SQL format (IMO) is a big mistake. The SQL format is so simple to manipulate and compare:

        sqlite> .dump onegb PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE onegb ( alpha varchar, no varchar, hex varchar, bin varch +ar ); INSERT INTO "onegb" VALUES('nvbqurkn','05812224','42295f70','"MpI&# +9618;Spզ&#9500;cNB͹s-&#9488;5<="'); INSERT INTO "onegb" VALUES('xzpviekv','72456229','fd245120','".&#957 +1;=&#9571;j&#8215;l&#9565;&#9619;~zo=fI&#9516;Z"'); INSERT INTO "onegb" VALUES('qrkhwqwl','43135665','316f7a36','"e&#9559 +;&#9492;&#9580;&#9524;&#9562;&#9556;wf6lqc&#9604;iZ&#9508; +}E&#8962;&#9508;"'); ...

        (Okay. That's not the most complex schema in the world; but one advantage of sqlite for this kind of thing is that it actually uses very few different datatypes.)

        Whereas comparing XML is a nightmare. Less so I suppose if you don't have to worry about the attributes/nested sub-items equivalence -- I assume that however you are generating the XML, it at least produce consistent XML.

        I think you are making hard work for yourself using XML, but as I said above somewhere, it is your choice. Good luck :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Testing database updates.
by Corion (Pope) on Apr 18, 2012 at 09:25 UTC

    Personally, I really like to use DBD::SQLite with a :memory: database for each test. For each test case, I load the database mostly using DBIx::RunSQL, then run the test modifications against it and then run query statements as tests that check the database status for the "interesting" results.

    A typical test file looks like this:

    AUFGABENBEDARFZEITRAUM Werte durch Trigger --- SELECT Aufgabe, Anfang, Ende, id as aufgabenbedarf, werktage, tagesbedarf, tagesbedarf_mak FROM t_aufgabenbedarf --AUFGABENBEDARFZEITRAUM_V order by aufgabe --- [ [ '301', '2011-10-01 00:00:00', '2011-10-31 00:00:00', '141', '20', '28080', '1.04545454545455' ], [ '302', '2011-10-01 00:00:00', '2011-10-31 00:00:00', '142', '20', '14040', '0.522727272727273' ], [ '303', '2011-10-01 00:00:00', '2011-10-31 00:00:00', '143', '20', '14040', '0.522727272727273' ]]

    It is basically the test title, test query and the expected results, as I have to test many views that should remain constant.

    The driver for these tests is (abridged) the following program, with some Oracle-to-SQLite converter ripped out for brevity:

    #!perl -w use strict; use DBD::SQLite; use File::Glob qw(bsd_glob); use Getopt::Long; use DBIx::RunSQL; use Test::More; use Data::Dumper; GetOptions( 'verbose|v' => \my $verbose, 'db:s' => \my @ora, 'commit-prepare' => \my $do_commit, 'sqlite:s' => \my $dbfile, ); $dbfile ||= ':memory:'; my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile", undef, undef, { AutoCommit => 0, PrintError => 0, RaiseError => 1 } ); $dbh->do('PRAGMA synchronous = OFF;'); print "# Using DBD::SQLite $DBD::SQLite::VERSION on Perl $^V\n" if $verbose; my (@sql_files) = @ARGV; @sql_files = sort map { bsd_glob $_ } @sql_files; # First read the SQL files that set up this test case my @setup = grep /\.sql$/, @sql_files; push @setup, 'tests/compat-triggers.sql'; # And separate out the test files that run against this test case my @tests = grep /\.t$/, @sql_files; # Also find some global invariant tests that should hold true for # any test case. These are mostly sanity tests that the database # was constructed correctly my @invarianten = glob "tests/*.invariante"; if (@tests) { unshift @tests, @invarianten; plan tests => 0+@tests; }; if (@ora) { load_oracle($_) for @ora; create_views(); $dbh->commit; }; for my $prepare (@setup){ DBIx::RunSQL->run_sql_file( dbh => $dbh, sql => $prepare, verbose => $verbose, ); $dbh->commit if $do_commit; }; for my $test (@tests) { open my $fh, '<', $test or die "$test: $!"; local $/ = "---"; chomp( my ($description,$sql,$expected) = <$fh>); $description =~ s/\s+$//; my $res = eval { $dbh->selectall_arrayref($sql) }; if( $@ ) { die "$@ when running \n$sql"; }; if(! is_deeply($res, eval $expected, $description)) { diag Dumper eval $expected; diag Dumper $res; }; }
Re: Testing database updates.
by sundialsvc4 (Monsignor) on Apr 18, 2012 at 11:29 UTC

    I would construct some kind of a test-script, e.g. using Test::More as its driver, which executes a series of queries with known results.

    I don’t think that a simple database-dump would be sufficient unless you have first determined that every table has a primary key and that the dump utility orders items by that key.   (And if it does have a primary key, testing for the behavior of the program in the case of a key-collision is beyond scope.)   Instead, and in any case, I would run a series of queries using ORDER BY and GROUP BY as necessary so that the results of each query are known to be comparable.

    My approach as described is basically “forensic testing.”   Or what has sometimes been called “discovery testing.”   Your goal is generally to determine what “the right answer” is, within a set of well-defined test cases chosen by you, and then to determine if both the old and the new code produce an identical outcome.   You can’t do that with “100% coverage,” by the way, and in any case you do need to approach the problem in a way that points fairly directly to targeted sections of the code (in both the old and the new programs) and that does so in a way that points to a diagnosis and/or solution.   This will be an ongoing tool, that the company never had before but that you can expect to be used from now on.   Plan and design accordingly.   (Hence my suggestion of using an existing test-framework as the driver.)

    It can certainly be awkward to discover ... and you probably will discover ... that the existing application is wrong in some way that materially affects the existing report outputs that the company has come to rely upon in a business (e.g. auditing) sense, such that producing a different answer is more objectionable.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://965673]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2014-08-23 19:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (178 votes), past polls