http://www.perlmonks.org?node_id=481479


in reply to Re^5: Database processing
in thread DBI::st=HASH output

I've tried the SQL in earlier versions of this script, and I had no luck using an AS statement. When I run the script now I get:
SQL ERROR: Bad set function before FROM clause. Execution ERROR: No command found!. $VAR1 = undef;
When I remove the AS statement:
my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @cols = @{$sth->{NAME}}; shift @cols; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my $avgsHashRef = $dbh->selectrow_hashref($avgSQL);
I get
Execution ERROR: Can't use AVG on a string! called from ./runsql3.sql +at 24. DBD::CSV::db selectrow_hashref failed: Attempt to fetch row from a Non +-SELECT statement [for Statement "SELECT avg(ERWWCOMMUNITIESMemoryPag +essec) , avg(ERWWCOMMUNITIESNetworkInterfaceCompaqEthernetFastEthernetAdapter +ModuleBytesTotalPERsec) , avg(ERWWCOMMUNITIESNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalP +ERsec) , avg(ERWWCOMMUNITIESNetworkInterfaceCompaqEthernetFastEthernetAdapter +ModuleCurrentBandwidth) , avg(ERWWCOMMUNITIESNetworkInterfaceMSTCPLoopbackinterfaceCurrentBand +width) , avg(ERWWCOMMUNITIESPhysicalDisk0CPCTDiskTime) , avg(ERWWCOMMUNITIESPhysicalDisk1DPCTDiskTime) , avg(ERWWCOMMUNITIESPhysicalDiskTotalPCTDiskTime) , avg(ERWWCOMMUNITIESPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWCOMMUNITIESProcessorTotalPCTProcessorTime) , avg(ERWWDC1MemoryPagesPERsec) , avg(ERWWDC1NetworkInterfaceIntelPRO100NetworkConnectionBytesTotalPER +sec) , avg(ERWWDC1NetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec) , avg(ERWWDC1NetworkInterfaceIntelPRO100NetworkConnectionCurrentBandwi +dth) , avg(ERWWDC1NetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth) , avg(ERWWDC1PhysicalDisk0CPCTDiskTime) , avg(ERWWDC1PhysicalDiskTotalPCTDiskTime) , avg(ERWWDC1PhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWDC1ProcessorTotalPCTProcessorTime) , avg(ERWWNABMemoryPagesPERsec) , avg(ERWWNABNetworkInterfaceIntel8255xbasedIntegratedFastEthernetByte +sTotalPERsec) , avg(ERWWNABNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec) , avg(ERWWNABNetworkInterfaceIntel8255xbasedIntegratedFastEthernetCurr +entBandwidth) , avg(ERWWNABNetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth) , avg(ERWWNABPhysicalDisk0CPCTDiskTime) , avg(ERWWNABPhysicalDiskTotalPCTDiskTime) , avg(ERWWNABPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWNABProcessorTotalPCTProcessorTime) , avg(ERWWSEARCHMemoryPagesPERsec) , avg(ERWWSEARCHNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +BytesTotalPERsec) , avg(ERWWSEARCHNetworkInterfaceIntelPRO100NetworkConnectionBytesTotal +PERsec) , avg(ERWWSEARCHNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec +) , avg(ERWWSEARCHNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +CurrentBandwidth) , avg(ERWWSEARCHNetworkInterfaceIntelPRO100NetworkConnectionCurrentBan +dwidth) , avg(ERWWSEARCHNetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth +) , avg(ERWWSEARCHPhysicalDisk0CDPCTDiskTime) , avg(ERWWSEARCHPhysicalDiskTotalPCTDiskTime) , avg(ERWWSEARCHPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWSEARCHProcessorTotalPCTProcessorTime) , avg(ERWWSQLWEBMemoryPagesPERsec) , avg(ERWWSQLWEBNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +BytesTotalPERsec) , avg(ERWWSQLWEBNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec +) , avg(ERWWSQLWEBNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +CurrentBandwidth) , avg(ERWWSQLWEBNetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth +) , avg(ERWWSQLWEBPhysicalDisk0CDPCTDiskTime) , avg(ERWWSQLWEBPhysicalDiskTotalPCTDiskTime) , avg(ERWWSQLWEBPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWSQLWEBProcessorTotalPCTProcessorTime) FROM results"] at ./runsql3.sql line 24. $VAR1 = undef;
Same error if I trim down the sql statement. I have also re-looked at my csv file and the columns data is numeric. I'm stumped!

Replies are listed 'Best First'.
Re^7: Database processing
by davidrw (Prior) on Aug 06, 2005 at 13:35 UTC
    Execution ERROR: Can't use AVG on a string! called from ./runsql3.sql at 24.
    Ah. that changes things -- DBD::CSV is treating the values as strings. Read the Data restrictions section of the docs where it says that DBD::CSV doesn't correctly handle data types, in particular NULLs. If you insert integers, it might happen, that fetch returns a string.

    You might want to try DBD::AnyData instead.. Or in the avg() make sure the value is casted to a number so that it's not treated as a string. or finally, you could just select * all the data (6000 rows) and use perl to loop over it once and average everything.
      I'm so close I can taste it!

      The Data Restrictions section had the answer. The NULL values were what was killing me. I'm gonna look into DBD:AnyData.

      I hate to keep asking questions, but I'm a Perl Novice. These files that I am working with could contain any number of NULL values. What would be the best way of handling these?

      --SCRATCH THAT--

      I think I got it. Now I just have 1 last question. How do I get it to output in csv format?
        How'd you handle the NULL's? I was about to say that options include converting the NULL's to 0's in your pre-processing step (maybe something like s/,(?=,)/,0/g, but doesn't work for last col) .. or you could run a bunch of "UPDATE results SET colname = 0 WHERE colname = ''" statements .. both of those, of course assume you want them treated as 0's and not as NULL's (will greatly affect the AVG value).

        As for output, perhaps something like this (depends on what you need in terms of cols) (note change in $dbh method used):
        my @avgs = $dbh->selectrow_array($avgSQL); print join(",", @cols) . "\n"; print join(",", @avgs) . "\n";