Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

RFC: Databases made easy

by GrandFather (Saint)
on Mar 20, 2011 at 04:20 UTC ( [id://894254]=perlmeditation: print w/replies, xml ) Need Help??

A recent glance at the database related tutorials in the Tutorials section indicated a lack of a very basic introduction to using databases so I decided to have a go at writing something to meet that need.

The following is intended as a very basic introduction to getting up and going with databases using Perl and DBI. SQLite (through DBD::SQLite) is assumed for all the sample code below because it is very easy to get going (just install the module) and for many people will be the only database they need for quite a while.

It is usual in this sort of tutorial to pass over the error handling. Despite that I've generally included error handling in the samples below. I also use transacted processing from the get go. Although this tutorial is intended to introduce the very basics of databases, it is also intended to demonstrate safe and robust technique right from the start. As a relative neophyte in matters database, I expect I'll receive a little feed back relating to some of the code!

Ok, so lets dive straight in by creating a simple database containing a single table of two columns:

use strict; use warnings; use DBI; #1 my $dbh = DBI->connect ("dbi:SQLite:Test.sqlite"); die "connect failed: $DBI::errstr" if ! $dbh; #2 local $dbh->{AutoCommit} = 0; # Use transacted processing local $dbh->{RaiseError} = 1; # die on processing errors local $dbh->{PrintError} = 0; # Suppress printing errors #3 my $sql = qq{CREATE TABLE People (name VARCHAR(128), age INTEGER)}; #4 eval { $dbh->do ($sql); $dbh->commit (); }; #5 if ($@) { # An error occurred $dbh->rollback (); die $@; }

Assuming the script ran without errors you may be wondering if anything actually happened? As a first check look in the directory you ran the script from for a file called Test.sqlite. If the file exists the database has been created. If you can't find it most likely the current directory when you ran the script is not what you thought it was - I'll wait a moment while you figure it out.

If an error did occur (maybe you don't have write permission for the current directory) most likely you will see the error message twice - once generated by the DBI/DBD code, and once from the die $@; (JavaFan's tip about using dbh->{PrintError} = 0 fixes the double print issue). The file may have been generated in any case, but the table probably wasn't created. JavaFan's reply Re: RFC: Databases made easy shows much more robust error handling code than is shown in this sample. And as both JavaFan and mje point out, some databases don't provide roll back support for some operations including table creation.

With the file found you can use one of several tools to examine the database. An internet search for "SQLite Browser" should find a copy of SQLite Database Browser which will open the database file and allow you to examine and manipulate it. If you use Firefox then the SQLite Manager add-on may be the thing for you. In either case point the tool at Test.sqlite and confirm that it contains a People table with name and age columns. Such database management tools help enormously in checking that your database software is doing as you expect and in fixing the occasional processing glitch or performing adhoc queries.

You may have noticed that there is no mention of DBD::SQLite in the code. Not to worry, DBI figures out from the connect string (see #1 and discussion below) which database driver it should use.

The code following #1 connects to the database (creating the database file if it doesn't exist already). The connect string specifies the database driver type and provides information to the driver about the connection. For SQLite the driver uses the third part of the connection string as the file name for the database.

The #2 code then configures some error handling related stuff. Setting AutoCommit to 0 sets the database to a mode where changes to the database are effectively buffered until commit is called. This allows a related set of changes to the database to be effected in a single call or else easily undone by a call to rollback (if something failed during processing for example).

Setting RaiseError to 1 causes errors to be handled by using die when the error happens rather than setting an error flag and hoping that the calling code will notice the issue.

Taken together setting AutoCommit to 0 and RaiseError to 1 help make database processing very robust because errors get handled when they happen and the database doesn't get left in a corrupt state if processing fails in some fashion.

Now take a look at #3. The string is the SQL statement that is used to generate the People table. The table contains two columns of data named name and age. The name column is set to contain variable length strings up to 128 characters long. The age column is set to contain integer values. Note though that SQLite uses the type information provided in the create statement as a hint to how the data is expected to be used, but doesn't feel obliged to actually store it that way. In this regard SQLite is much more forgiving than many other database engines. Note too that SQL is case insensitive for key words, but by convention they are given in upper case.

Tucked away in the safety of an eval following #4 are the two statements that actually result in creating the table and updating the database: the do and commit lines. do is actually short hand for two statements: prepare and execute. It can be used when a statement handle is not returned (more about statement handles later) and where the SQL statement only needs to be executed once.

Let's move on. Time to add some data to our table:

use strict; use warnings; use DBI; my $dbh = DBI->connect ("dbi:SQLite:Test.sqlite"); die "connect failed: $DBI::errstr" if ! $dbh; $dbh->{AutoCommit} = 0; # Use transacted processing $dbh->{RaiseError} = 1; # die on processing errors my %people = ( Fred => 23, Jane => 22, Boyd => 27, Tania => 28, Julie => 27, Kyle => 21, ); #1 my $sql = qq{INSERT INTO People (name, age) VALUES (?, ?)}; my $entries = 0; eval { my $sth = $dbh->prepare ($sql); #2 while (my ($name, $age) = each %people) { #3 ++$entries if $sth->execute ($name, $age); #4 } $dbh->commit (); #5 return 1; } or do { my $err = $@ || "Unknown error inserting data"; eval {$dbh->rollback ()} or $err .= "\n Rollback processing fail +ed!"; die $err; }; print "Added $entries rows to the People table\n";

This code starts just the same as the previous script with connection to the database and setting up of the error handling. In later sample scripts this boilerplate code will be omitted, but you'll need to provide it for the scripts to run.

Again you can use your SQLite browser/manager to confirm that the script has done some work. Notice that there is now some data in the People table, in fact the data given in the %people hash in the script.

As in the first script a little SQL and a couple of function calls get the work done. The SQL statement at #1 is used to set up a statement handle that is used to insert data into the table. The VALUES part gives the values to be inserted. Notice that question marks (place holders) are used in place of the actual data. It is possible to supply the data in the SQL statement, but that should be avoided for several reasons (to avoid quoting issues and injection attacks for a start). The data to be inserted is supplied later in the execute (#4).

The prepare call generates a statement handle that is used to insert the data. The while loop (#3) iterates over the key/value pairs in the %people hash. For each iteration of the loop the execute (#4) inserts a name/age pair. Notice that two parameters ($name, $age) are passed in and match the two place holders (?, ?) in the SQL statement. When the loop completes commit (#5) is called to complete the transaction and commit the changes to the database.

Actually the whole insert loop can be replaced with:

$entries = $sth->execute_array ({}, [keys %people], [values %people]);

which is both more compact and very likely more efficient, but only works if you can efficiently build the parallel arrays of values. Notice that an array reference is passed in for each place holder in the SQL statement. For the moment you can ignore the empty hash reference passed as the first parameter - it's there so you can pass attributes in to execute, but much of the time it's not needed.

Thus far we have created a database, created a table and added some data to it. But what a database is really about is getting the data back out again. Most often we only want to get a subset of the data out of a database by searching through the data using some criteria for selecting the data we want. Say for example we wanted to search the data we've added to the database for people whose name ends in the letter 'e' and with an age less than 25?

#1 my $sql = qq{SELECT name, age FROM People WHERE name LIKE '%e' AND age + < 25}; eval { #2 my $sth = $dbh->prepare ($sql); $sth->execute (); #3 #4 while (my $row = $sth->fetchrow_hashref ()) { printf "%-10s %3d\n", @{$row}{'name', 'age'}; } };

This time the script prints the results rather than updating the database. The SQL (#1) is a select statement which will retrieve rows containing name and age columns from the People table where the name ends in the letter 'e' (% matches anything, rather like .* in a Perl regular expression) and age is less than 25. The execute (#3) fetches the data from the database and the while loop (#4) iterates over the rows that were fetched one at a time to print the results.

Ok, that's got a simple single table database created, some data added and a search made on it. Databases provide a vast amount more power than we have looked at so far, and a lot of that comes when several tables are involved in a search, but that is a tutorial for another day. Mean time the documentation for the flavour of SQL understood by SQLite at http://www.sqlite.org/lang.html is a good reference for SQL.

Updates: various updates suggested by replies incorporated. Thank you mje and JavaFan for your contributions!

True laziness is hard work

Replies are listed 'Best First'.
Re: RFC: Databases made easy (and hard database tasks made possible)
by Your Mother (Archbishop) on Mar 20, 2011 at 07:09 UTC

    Thanks *much* for doing this. This sort of beginning ramp in is what’s so hard to find (in quality examples anyway) when starting. You have have probably saved many strangers many tortured hours.

    I’m going to ride on your coat-tails a bit and mimic your example with advanced tools. To show how facile and non-threatening, I hope, these basics can be in DBIx::Class; which can come off as unapproachable, unfriendly, and pointless when DB stuff—on the surface!—is so easy in Perl already.

    (Update: converted Pod links to list.)

Re: RFC: Databases made easy
by mje (Curate) on Mar 21, 2011 at 09:46 UTC

    Nice tutorial GrandFather. I have a few comments:

    • You cannot use the existence of Test.sqlite to infer the script worked as if you simply issue your create table and roll it back you will still have a Test.sqlite file.
    • Anyone actually running the first example might get confused by the error being output twice if it fails - PrintError. e.g., if they run it from a dir they don't have write permission for.
    • Not all databases can rollback DDL changes - SQLite can but many others cannot so it might not be a good example.
    • rollback can also die.
    • In the second script execute can return true but not actually insert a row. Obviously it is not going to happen here as your SQL is inserting but it is a common mistake that beginners make to think because execute returns true the SQL actually did something e.g., an update which does not change anything usually returns 0 but true.
Re: RFC: Databases made easy
by JavaFan (Canon) on Mar 21, 2011 at 11:51 UTC
    Your first example creates a table inside a transaction, and actually tries to do a rollback on failure. Many databases don't support transactional support for table creation.

    Your second example however doesn't have an explicite rollback. If say, the third insert fails, the eval block leaves, but two rows have been inserted, still waiting for a commit or a rollback. If the program exits afterwards, there will be immplicite rollback, but it's more appropriate to do an explicite rollback on failure.

    Also, it quickly pays off of to group the insert into a single statement, especially if the database server is on a different machine. Each ->execute() adds twice the network latency between the machine your code runs on and the database server - on top of whatever needs to be done on the database server.

    I also prefer to write the boilerplate as:

    local $dbh->{RaiseError} = 1; local $dbh->{PrintError} = 0; eval { $dbh->begin_work; ... queries go here ... $dbh->commit; 1; } or do { my $err = $@ || "Unknown error"; eval { $dbh->rollback; 1; } or do { $err .= "[ROLLBACK FAILED: " . ($@ || "Unknown reasons") . "]" +; } die $err; }
    In particular, I rely on the return value of eval to determine whether it succeeded or failed, not $@.
Re: RFC: Databases made easy
by JavaFan (Canon) on Mar 22, 2011 at 11:52 UTC
    If an error did occur (maybe you don't have write permission for the current directory) most likely you will see the error message twice - once generated by the DBI/DBD code, and once from the die $@;
    That can easily be prevented by doing:
    $dbh->{PrintError} = 0;
    The DBI manual recommends this:
    If you turn "RaiseError" on then you'd normally turn "PrintError" off.
    my $entries = eval { my $entries = 0; my $sth = $dbh->prepare ($sql); #2 while (my ($name, $age) = each %people) { #3 ++$entries if $sth->execute ($name, $age); #4 } $dbh->commit (); #5 return $entries; } or do { my $err = $@; eval {$dbh->rollback ()} or $err .= "\n Rollback processing +failed!"; die $err; };
    That code will die if %people is empty. I'd write that as:
    my $entries; eval { ... stuff ... $entries++ if $sth->execute($name, $age); ... more stuff ... $dbh->commit; 1; } or do { ... rollback and die ... };
Re: RFC: Databases made easy
by sundialsvc4 (Abbot) on Mar 22, 2011 at 22:08 UTC

    A general comment that I would add to this (excellent!) tutorial is, simply:   “Always be mindful of transactions.”

    Different people have different preferences about how they want to deal with transactions, and DBI tries to accommodate everyone.   You can start and commit and roll-back the transactions explicitly, or you can let DBI issue some of these calls for you.   The important thing is simply to know what is going to happen in the code that you write.

    In the case of SQLite, transactions are particularly important!   When a transaction is in progress, SQLite will defer certain disk I/O operations until the transaction commits (if not sooner), but when a transaction is not in progress, SQLite will ensure that every disk write is known to be successfully finished before the statement completes.   Although this behavior is conservative and, undoubtedly, “technically correct,” this can have a dramatically negative impact on performance, even on a very fast machine.   (“Gotcha!!”)

Re: RFC: Databases made easy
by erix (Prior) on Mar 24, 2011 at 21:26 UTC

    PostgreSQL, compared to SQLite, is a little more difficult to get started with, but for those who have already got over postgres' initial connection gotchas: the SQLite connection string that is used here ("dbi:SQLite:Test.sqlite") can be replaced with a connection string for PostgreSQL, for instance: "dbi:Pg:port=6542;db=testdb", and the three parts in the OP (create table, insert rows, select rows) will run exactly the same on such a postgresql $dbh (data base handle).

    PostgreSQL flavour of SQL: sql-commands.html

    SQLite remains a toy-database; IMHO one might as well start out with the real thing.

Re: RFC: Databases made easy
by deMize (Monk) on Mar 23, 2011 at 04:16 UTC
    Great work. It also might be nice to note that you can set the dbh flags as a parameter during the connection. It's clear here, but when you are connecting to multiple databases and have a lot of db handles and statement handles flying around, consolidating the code makes things easier to read :)


    Demize
      This brings up the other notion of using connect_cached() instead of connect(), which new users may be unfamiliar with.


      Demize

        /me nods...

        One of the best things I ever did, was to sit down and actually read every bit of the documentation page on the CGI module, and then, every bit of the documentation about the drivers I was using.   This is deep material and there is a lot of it, and I read it through twice, resisting mightily the urge to “scan” (or simply, to let my eyes glaze over and Zzzzzz...).   Grab a triple-shot espresso and do the same.   (Fair warning:   it took several days.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://894254]
Approved by Corion
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-19 11:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found