Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Database Record Order

by vbrtrmn (Pilgrim)
on Dec 30, 2003 at 22:01 UTC ( #317788=perlquestion: print w/replies, xml ) Need Help??

vbrtrmn has asked for the wisdom of the Perl Monks concerning the following question:

Okay, to start I'm dealing with the most poorly designed database I've ever seen. My goal is to cut certain records out of a 1.4 million record database, based on readability of the word. That part is easy!

Database Sampling:

| ID | WORD | FLAG | LWORD | dc5943 | bears | 0 | bears | mu0286 | best summer tunes | 0 | bestsummertunes | mu0286 | bestsummertunes | 1 | | muz334 | chk chk chk | 0 | chkchkchk | ofn00861 | chuck e cheese.com | 0 | chuckecheesecom | dcz013 | dc restaurants | 0 | dcrestaurants | dcz0013 | dc restaurants | 0 | dcrestaurants | dcz013 | dc resturants | 1 | | dcz0013 | dc resturants | 1 | | dcz013 | dc american dining | 0 | dcamericandining | dcz0003 | elmira wv | 0 | elmirawv | dc0160 | elmira, c. a. | 1 | | tw0056 | elmira, n. y. | 1 |

Okay, I'll just sort based on how the data comes in via the FLAG. Depending on what the last record was I can determine what to do with the current one... or so I thought.

I'm using DBI, binding the column names with $sth->bind_columns(...); and grabbing the rows with while($sth->fetch()) { ... }. The data, which perl outputs isn't in the same order as the database.

Output Sampling (first 10 records):

| ID | WORD | FLAG | LWORD | wq12351y059 | hawkinschemical | 1 | | wq12366y059 | healtheon corp. | 1 | | wq12367y059 | healthgatedata. | 1 | | wq12382y059 | helixtechnology | 1 | | wq12385y059 | helpathome,inc. | 1 | | wq12390y059 | henryschein,inc | 1 | | wq12390y059 | henryscheininc. | 1 | | wq12397y059 | hermanmillerinc | 1 | | wq12404y059 | hffinancialcorp | 1 | | wq12409y059 | hickorytechcorp | 1 |

Is there a way for me to get the data out of the DB as it appears OR will it return this way everytime (it always comes back this way).

Here's my code (exerpt), for reference:

$DSN = 'DBI:ODBC:SearchWords'; $USER= ''; $PASS= 'password'; $DBH = DBI->connect( $DSN, $USER, $USER ) || die "Connect Error: $DBI: +:errstr\n"; my $SQL = "SELECT * FROM [Production Words]"; my $sth = $DBH->prepare($SQL); $sth->execute || die "Could not execute SQL statement ... maybe invali +d?\n$!"; $sth->bind_columns(\$aid,\$wd,\$we,\$df,\$lw,\$sw); while($sth->fetch()) { print "$aid\t$wd\t$we\t$df\t$lw\t$sw\n"; } $DBH->disconnect();

System Info: Win32, Perl 5.8.2 on Cygwin, standard CPAN DBI module.

--
paul

Replies are listed 'Best First'.
Re: Database Record Order
by MidLifeXis (Monsignor) on Dec 30, 2003 at 22:13 UTC

    Unless you include an "ORDER BY" clause in your SQL statement, the order of the results is undefined. It does not even need to be stable between calls.

    For example, you may want to do....

    SELECT * FROM [Production Words] ... ORDER BY ID
    --MidLifeXis
      The problem in using ORDER BY is that each ID is somewhat random, rows 7, 8, 9 share the same ID in 7 and 9, but use a slightly different one in 8.

      --
      paul

        | dcz013   | dc  restaurants    | 0    | dcrestaurants
        | dcz0013  | dc  restaurants    | 0    | dcrestaurants
        | dcz013   | dc  resturants     | 1    |
        | dcz0013  | dc  resturants     | 1    |
        | dcz013   | dc american dining | 0    | dcamericandining
        
        Yeah ... that's going to cause problems alright. Since each ID is bogus, you will need to create a new one, but keep the old just in case. I suggest using an unsigned integer that is auto incremented by the RDMS, but PHB's tend to like ID's with letters in them (don't listen to 'em!).

        If i were in your shoes, i would create a new table and figure out a way to convert the rows in the old table into the new. Prune as you go ... some of those rows have to be redundant and incorrect. You will no doubt not get it right the first few attempts, so prepare for that by having your script first DROP the new table and CREATE it from scratch. Best of luck, this doesn't sound too fun ... :/

        UPDATE:
        OK, i think i might have a viable gameplan.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re: Database Record Order
by jeffa (Bishop) on Dec 30, 2003 at 22:31 UTC
    This is totally unrelated to your question, but do yourself a big favor and start replacing:
    $DBH = DBI->connect( $DSN, $USER, $PASS ) || die "Connect Error: $DBI: +:errstr\n";
    With
    $DBH = DBI->connect( $DSN, $USER, $PASS, { RaiseError => 1} );
    and enjoy the luxury of never needing to worry about placing
    || die $DBH->errstr;
    after every $DBH method call again. ;)

    UPDATE:
    Thanks for catching that typo 3dan. :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      Good advice. But you might want to replace:

      $DBH = DBI->connect( $DSN, $USER, $PASS, { RaiserError => 1} );

      With this:

      $DBH = DBI->connect( $DSN, $USER, $PASS, { RaiseError => 1} );

      ;)

      --
      3dan

      I got the impression that you still had to check for errors on the connect, after which if the RaiseError attribute was set further errors on that database handle would be fatal. From the documentation (emphasis added):
      If the connect fails (see below), it returns "undef" and sets both $DBI::err and $DBI::errstr. (It does not set $!, etc.) You should generally test the return status of "connect" and "print $DBI::errstr" if it has failed.
      Don't get me wrong: I'd love it if DBI behaved in the manner in which you describe. However, I'm willing to do the error checking once if it does not. :-)

      thor

        Actually, if you try to connect() with bogus credentials with either RaiseError turned on or off, your script is going to die regardless of whether you add or die $DBI::errstr or not. I don't bother adding it, because i just don't have to. If the connect fails, my program will die (just run the debugger and see for yourself).

        In other words, DBI does behave in the manner in which i described. But you can still be explicitly redundant, if you want. ;)

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
      Thanks jeffa!!

      --
      paul

Re: Database Record Order
by edoc (Chaplain) on Dec 30, 2003 at 22:20 UTC

    The data, which perl outputs isn't in the same order as the database.

    The database has no inherent order. You must consider query results as being returned from the database in random order (much like keys from a hash) unless you specify an 'order by' clause.

    cheers,

    J

      That depends on the DBMS you are using. When the ORDER BY clause is not specified, some dBs return the data in the order it was inserted into the tables. You'll have to read your dB's documentation to be certain.
        If this is an MS-SQL server then the "default" order is likely to be the order of the clustered index. However, relying on that ordering when using an RDBMS is unwise, as it could change at any time, or be different if you move your script from one type of database server to another, or if other DDL changes (such as different indexes, different locking methods, etc) are applied.

        In short - never rely on the order of data in a SELECT without an ORDER BY clause.

        Michael

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://317788]
Approved by rob_au
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2023-02-04 05:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer not to run the latest version of Perl because:







    Results (30 votes). Check out past polls.

    Notices?