Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Feedback Appreciated on text-parsing, SQL querying subroutine

by Angharad (Pilgrim)
on May 31, 2006 at 08:47 UTC ( #552715=perlquestion: print w/ replies, xml ) Need Help??
Angharad has asked for the wisdom of the Perl Monks concerning the following question:

Hi there
I've written the following perl subroutine that takes a text file, parses it, and then uses it to get data out of a database using an SQL query. It then prints off certain results from that SQL query depending on a set of criteria.
It works fine, but in the interest of improving my perl I was wondering if anyone would like to comment on this and suggest other/better ways of doing the same thing.
my $inputfile = "file.txt"; my $dbh = DBI->connect("dbi:Pg:dbname=db;host=dbhost", "user", "passwd", {AutoCommit => 1}); unless (open(INPUT, "$inputfile")) { print "ERROR: Can't open file for reading : $!\n"; } while(<INPUT>) { # splitting on tab character my @data = split("\t", $_); # assigning varables my $did1 = $data[0]; my $did2 = $data[1]; my $score1 = $data[2]; my $score2 = $data[3]; if(($score1 >=70) && ($score2 >=60)) { my $sth1 = $dbh->prepare("select d_id, c_id from d where d_id += '$did1';"); $sth1->execute(); my $sth2 = $dbh->prepare("select d_id, c_id from d where d_id += '$did2';"); $sth2->execute(); my $array_ref1 = $sth1->fetchall_arrayref(); my $array_ref2 = $sth2->fetchall_arrayref(); foreach my $row1(@$array_ref1) { my ($did1, $cid1) = @$row1; foreach my $row2(@$array_ref2) { my ($did2, $cid2) = @$row2; if($cid2 != $cid1) { print "$did1, $did2, $cid1, $cid2\n"; } } } } } close(INPUT); }
The text file looks like this
A34253 S43154 70 67 C31243 C31243 70 73 Y54231 W65313 70 71 U65242 B65231 70 70 Z23154 Z23154 70 65
It prints out the results of the SQL query only if score1 is at least 70, score2 is as least 60 and the 'cid1' for 'did1' is not the same as 'cid2' for 'did2' (the cid codes are retrieved from the SQL query).
Thanks a lot.

2006-06-01 Retitled by planetscape, as per Monastery guidelines

( keep:0 edit:28 reap:0 )

Original title: 'Feedback Appreciated'

Comment on Feedback Appreciated on text-parsing, SQL querying subroutine
Select or Download Code
Re: Feedback Appreciated on text-parsing, SQL querying subroutine
by jesuashok (Curate) on May 31, 2006 at 09:08 UTC
    Hi

    I would like to notify some points to you, from your code.

    unless (open(INPUT, "$inputfile")) { print "ERROR: Can't open file for reading : $!\n"; }
    If there is any error in opening the file Just you are printing the error message and continuing the code with
    while ( <INPUT> ).

    How can you do that ?
    Is that mandatory in your code? Just think it off on that.
    # splitting on tab character my @data = split("\t", $_); # assigning varables my $did1 = $data[0]; my $did2 = $data[1]; my $score1 = $data[2]; my $score2 = $data[3];
    The above code can be written as
    my ( $did1,$did2,$score1,$score2 ) = (split("\t", $_))[0..3];

    "Keep pouring your ideas"
      The above code can be written as my ( $did1,$did2,$score1,$score2 ) = (split("\t", $_))[0..3];
      As a curiousity, how come you are subscripting split? Split returns a list, so he should be safe with just...
      my ($did1, $did2, $score1, $score2) = split("\t", $_);
      Right?
        Hi

        I accept your point but if the split returns more than 4 values the last one will contain the count of remaining elements.Please try a sample program with my sample code. you will be able to understand clearly.

        :-)

        "Keep pouring your ideas"
Re: Feedback Appreciated on text-parsing, SQL querying subroutine
by friedo (Prior) on May 31, 2006 at 09:17 UTC
    I have a few suggestions.

    unless (open(INPUT, "$inputfile")) { print "ERROR: Can't open file for reading : $!\n"; }

    First, there's no need to quote $inputfile, just pass the scalar as the parameter to open. Second, although you check for an open error (good!) you don't do anything if it fails, aside from printing the error message. I would rewrite it using the following standard Perl idiom:

    open INPUT, $inputfile or die "Can't open $inputfile for reading: $!";

    Even better, you may want to use a more modern lexical filehandle instead of a global one, but this isn't a big deal for a small script, IMHO.

    my @data = split("\t", $_); # assigning varables my $did1 = $data[0]; my $did2 = $data[1]; my $score1 = $data[2]; my $score2 = $data[3];

    Here you put a bunch of stuff in @data just to take it out again. Instead, you can use a parallel assignment to make it more clear:

    my ( $did1, $did2, $score1, $score2 ) = split "\t";

    Note that I also ommitted the $_ from the split, since split works on $_ by default.

    my $sth1 = $dbh->prepare("select d_id, c_id from d where d_id = '$did1';"); $sth1->execute(); my $sth2 = $dbh->prepare("select d_id, c_id from d where d_id = '$did2';"); $sth2->execute();

    This is the biggest red flag for me. You're not using placeholders! What happens if someone sneaks some nasty SQL into $did1? Or if $did2 contains some characters that need escaping? You should always use DBI placeholders (see the DBI documentation for a complete explanation.)

    my $sth1 = $dbh->prepare("select d_id, c_id from d where d_id = ?"); $sth1->execute( $did1 ); my $sth2 = $dbh->prepare("select d_id, c_id from d where d_id = ?"); $sth2->execute( $did2 );

    Finally, I'm assuming you've got strict and warnings turned on but you just forgot to paste that, right? :) You are using lexical variables well and the code seems simple and straightforward.

Re: Feedback Appreciated on text-parsing, SQL querying subroutine
by davorg (Chancellor) on May 31, 2006 at 09:25 UTC

    A few picky comments:

    unless (open(INPUT, "$inputfile")) { print "ERROR: Can't open file for reading : $!\n"; }

    This would usually be written as:

    open INPUT, '<', $inputfile or die "ERROR: Can't open file for reading : $!\n";
    while(<INPUT>) {

    You probably want to call chomp each time round the loop

    my @data = split("\t", $_);

    The first argument to split is a regular expression. Also, the second argument is optional and defaults to $_. So I'd write that as:

    my @data = split /\t/;

    Actually, if you omit the first argument then it defaults to /\s+/ which looks like it would work in this case.

    my @data = split;

    Then we have:

    my $did1 = $data[0]; my $did2 = $data[1]; my $score1 = $data[2]; my $score2 = $data[3];

    I'm can't really see the point of splitting that array into separate variables. If you want to make the code more readable, then I'd put the data into a hash.

    my %rec; @rec{qw(did1 did2 score1 score2)} = @data;

    In fact I'd do away with the need for @data completely and capture the output from split directly into the hash.

    my %rec; @rec{qw(did1 did2 score1 score2)} = split;

    The biggest problem I can see is with your use of DBI. You prepare your statements each time around the loop. Far more efficient to prepare them once and just execute them each time round the loop. You do this by putting placeholders in your SQL and passing the values to the call to execute.

    # before the loop my $sth1 = $dbh->prepare("select d_id, c_id from d where d_id = ?"); my $sth2 = $dbh->prepare("select d_id, c_id from d where d_id = ?"); # Then within the loop $sth1->execute($rec{did1}); $sth2->execute($rec{did2});

    The other thing I was thinking is that you could probably do a lot more of the work in the SQL and not have to do so much processing in your Perl code. But I'm afraid I don't have time to look at that in much detail right now.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Feedback Appreciated on text-parsing, SQL querying subroutine
by Angharad (Pilgrim) on May 31, 2006 at 10:05 UTC
    Thanks a lot gys, much appreciated. What about the fetchall_arrayref option - might there be a better way of fetching and going though the data do you think or is this ok?
Re: Feedback Appreciated on text-parsing, SQL querying subroutine
by davidrw (Prior) on May 31, 2006 at 12:28 UTC
    You are basically just manually doing a SQL JOIN in the code -- you can simplify this by letting the database (in this case i see it's postgres) do the work:
    ... # open file; connect to $dbh my $sth = $dbh->prepare(qq{ SELECT d1.d_id as did1, d2.d_id as did2, d1.c_id as cid1, d2.c_id as cid2 FROM d as d1, d as d2 WHERE d1.d_id = ? AND d2.d_id = ? AND d1.c_id != d2.c_id }); while(<INPUT>){ my( $did1, $did2, $score1, $score2 ) = split "\t", $_; next unless $score1 >=70 && $score2 >=60; $sth->execute($did1,$did2); while( my $row = $sth->fetchrow_arrayref ){ print join(', ', @$row), "\n"; } } close INPUT;
Re: Feedback Appreciated on text-parsing, SQL querying subroutine
by ioannis (Priest) on May 31, 2006 at 14:21 UTC
    The $dbh->prepare is in a loop, with the same statement and handle attributes. You only need to prepare it once:
    my $sth2 = $dbh->prepare_cached("select d_id, c_id from d where d_id = ?"); $sth2->execute( $did2 ); # And since you need less clutter on the monitor (my $sth2 = $dbh->prepare_cached(<<"")) -> execute ($did2); select d_id, c_id from d where d_id = ? # line above intentionally left blank
      Actually, he's not using placeholders so in his case, he will have to put the prepare in the loop. Once he switches to placeholders, then preparing it once at the beginning would be much better.

      Also, your HEREDOC terminated with an empty line and a mandatory comment (otherwise, it will get broken over and over) seems a lot less clear and more cluttered than your first example.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://552715]
Approved by holli
Front-paged by Marza
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2014-07-30 02:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls