Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Naveen_learn:

I'd suggest learning SQL a little better so you can make the database do the part it's good at. In particular, start thinking about how to use join and think about data retrieval as a set operation rather than a procedural operation. If you do so, you could execute a single SQL statement and process it something like this (untested):

# Get all combinations of (seg_dtl_id,pan_id) for seg_id=1 $sql = <<EOSQL; select seg.seg_dtl_id, dtl.pan_id from segmentation seg join seg_dtl_panelist dtl on dtl.seg_dtl_id=seg.seg_dtl_id where seg.seg_id='1' EOSQL $rs = $db->query($sql); # Remember previous dtl_id so we can tell to start a new line my $prev_dtl_id; foreach (@$rs) { my ($dtl_id, $pan_id) = @$_; if ($dtl_id ne $prev_dtl_id) { # Only start a new line if we're not on first dtl_id print FILE "\n" if defined $prev_dtl_id; print FILE "$dtl_id: "; $prev_dtl_id = $dtl_id; } print FILE "$pan_id|"; }

You can get fancier and get the SQL server to give you counts, averages, max/min values, subtotals, etc. If you spend a bit of time learning SQL, you'll find a lot of these sorts of operations easier and faster.

Why is your code inefficient? Primarily because you're executing more queries than you need to, leading to:

  • The SQL server needs to parse a query for each data element
  • Perl has to communicate with the SQL server for each query and result set, leading to *multiple* round-trips over the network for each data element.
  • The SQL server which is optimized for the task of data retrieval and joining tables is being ignored, and you're doing the work, yourself, in perl.

Doing it as a single query gives you one round-trip to parse/execute the query, and since the query returns all your results, it can bundle multiple rows per network packet, greatly reducing the amount of network traffic. (That may be a benefit for your other systems, as well!)

Update: Just in case you want to see how much network traffic could potentially be saved, here are some fake numbers, loosely based in reality.

For this calculation, I'm assuming it takes one round-trip for executing the query, and one round-trip for fetching a single result. Since you're fetching back only a single value from the database each time, I'm going to further assume that each result set could hold values at a time.

 # Dtl IDs# PAN IDs/DtlIDYour I/OsMy I/Os
Case 1101020221
Case 1100102,011201
Case 1101002,002201
Case 110010020,0112,001

So with these assumptions, it looks like you'll have 10 times the network traffic with your code. The dominant factor is the number of data values returned per network transmission, which I conservatively put at 10. I expect that it would be greater than that, with a corresponding increase in efficiency between the two methods.

Someone may want to check the calculations and/or method. For the "My I/Os" column, I have 1 round trip for the query, and (# elements per row)*(# dtl IDs)*(# PAN IDs)/(# values per packet). For the "Your I/Os" column, I come up with 2 round trips round trips when # Dtl IDs is 10, and 11 when # Dtl IDs is 100 and I add 2*(# Dtl IDs)*(# PAN IDs) to it (where 2 is 1 round trip for the query, and one for getting the result).

...roboticus

When your only tool is a hammer, all problems look like your thumb.


In reply to Re: Need to Improve Performance by roboticus
in thread Need to Improve Performance by Naveen_learn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (14)
    As of 2014-09-30 16:49 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (378 votes), past polls