Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^4: Database processing

by DrAxeman (Beadle)
on Aug 06, 2005 at 02:48 UTC ( #481428=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Database processing
in thread DBI::st=HASH output

I printed $avgSQL and got:

SELECT avg(PDH_CSV40_EasternDaylightTime_240_) as PDH_CSV40_EasternDay +lightTime_240_, avg(ERWWCOMMUNITIES_Memory_Pages_sec) as ERWWCOMMUNIT +IES_Memory_Pages_sec, avg(ERWWCOMMUNITIES_NetworkInterface_CompaqEthe +rnet_FastEthernetAdapter_Module_BytesTotalPERsec) as ERWWCOMMUNITIES_ +NetworkInterface_CompaqEthernet_FastEthernetAdapter_Module_BytesTotal +PERsec, avg(ERWWCOMMUNITIES_NetworkInterface_MSTCPLoopbackinterface_B +ytesTotalPERsec) as ERWWCOMMUNITIES_NetworkInterface_MSTCPLoopbackint +erface_BytesTotalPERsec, avg(ERWWCOMMUNITIES_NetworkInterface_CompaqE +thernet_FastEthernetAdapter_Module_CurrentBandwidth) as ERWWCOMMUNITI +ES_NetworkInterface_CompaqEthernet_FastEthernetAdapter_Module_Current +Bandwidth, avg(ERWWCOMMUNITIES_NetworkInterface_MSTCPLoopbackinterfac +e_CurrentBandwidth) as ERWWCOMMUNITIES_NetworkInterface_MSTCPLoopback +interface_CurrentBandwidth, avg(ERWWCOMMUNITIES_PhysicalDisk_0C_PCTDi +skTime) as ERWWCOMMUNITIES_PhysicalDisk_0C_PCTDiskTime, avg(ERWWCOMMU +NITIES_PhysicalDisk_1D_PCTDiskTime) as ERWWCOMMUNITIES_PhysicalDisk_1 +D_PCTDiskTime, avg(ERWWCOMMUNITIES_PhysicalDisk_Total_PCTDiskTime) as + ERWWCOMMUNITIES_PhysicalDisk_Total_PCTDiskTime, avg(ERWWCOMMUNITIES_ +PhysicalDisk_Total_AvgDiskQueueLength) as ERWWCOMMUNITIES_PhysicalDis +k_Total_AvgDiskQueueLength, avg(ERWWCOMMUNITIES_Processor_Total_PCTPr +ocessorTime) as ERWWCOMMUNITIES_Processor_Total_PCTProcessorTime, avg +(ERWWDC1_Memory_PagesPERsec) as ERWWDC1_Memory_PagesPERsec, avg(ERWWD +C1_NetworkInterface_IntelPRO_100NetworkConnection_BytesTotalPERsec) a +s ERWWDC1_NetworkInterface_IntelPRO_100NetworkConnection_BytesTotalPE +Rsec, avg(ERWWDC1_NetworkInterface_MSTCPLoopbackinterface_BytesTotalP +ERsec) as ERWWDC1_NetworkInterface_MSTCPLoopbackinterface_BytesTotalP +ERsec, avg(ERWWDC1_NetworkInterface_IntelPRO_100NetworkConnection_Cur +rentBandwidth) as ERWWDC1_NetworkInterface_IntelPRO_100NetworkConnect +ion_CurrentBandwidth, avg(ERWWDC1_NetworkInterface_MSTCPLoopbackinter +face_CurrentBandwidth) as ERWWDC1_NetworkInterface_MSTCPLoopbackinter +face_CurrentBandwidth, avg(ERWWDC1_PhysicalDisk_0C_PCTDiskTime) as ER +WWDC1_PhysicalDisk_0C_PCTDiskTime, avg(ERWWDC1_PhysicalDisk_Total_PCT +DiskTime) as ERWWDC1_PhysicalDisk_Total_PCTDiskTime, avg(ERWWDC1_Phys +icalDisk_Total_AvgDiskQueueLength) as ERWWDC1_PhysicalDisk_Total_AvgD +iskQueueLength, avg(ERWWDC1_Processor_Total_PCTProcessorTime) as ERWW +DC1_Processor_Total_PCTProcessorTime, avg(ERWWNAB_Memory_PagesPERsec) + as ERWWNAB_Memory_PagesPERsec, avg(ERWWNAB_NetworkInterface_Intel825 +5x_basedIntegratedFastEthernet_BytesTotalPERsec) as ERWWNAB_NetworkIn +terface_Intel8255x_basedIntegratedFastEthernet_BytesTotalPERsec, avg( +ERWWNAB_NetworkInterface_MSTCPLoopbackinterface_BytesTotalPERsec) as +ERWWNAB_NetworkInterface_MSTCPLoopbackinterface_BytesTotalPERsec, avg +(ERWWNAB_NetworkInterface_Intel8255x_basedIntegratedFastEthernet_Curr +entBandwidth) as ERWWNAB_NetworkInterface_Intel8255x_basedIntegratedF +astEthernet_CurrentBandwidth, avg(ERWWNAB_NetworkInterface_MSTCPLoopb +ackinterface_CurrentBandwidth) as ERWWNAB_NetworkInterface_MSTCPLoopb +ackinterface_CurrentBandwidth, avg(ERWWNAB_PhysicalDisk_0C_PCTDiskTim +e) as ERWWNAB_PhysicalDisk_0C_PCTDiskTime, avg(ERWWNAB_PhysicalDisk_T +otal_PCTDiskTime) as ERWWNAB_PhysicalDisk_Total_PCTDiskTime, avg(ERWW +NAB_PhysicalDisk_Total_AvgDiskQueueLength) as ERWWNAB_PhysicalDisk_To +tal_AvgDiskQueueLength, avg(ERWWNAB_Processor_Total_PCTProcessorTime) + as ERWWNAB_Processor_Total_PCTProcessorTime, avg(ERWWSEARCH_Memory_P +agesPERsec) as ERWWSEARCH_Memory_PagesPERsec, avg(ERWWSEARCH_NetworkI +nterface_BroadcomNetXtremeGigabitEthernetDriver_BytesTotalPERsec) as +ERWWSEARCH_NetworkInterface_BroadcomNetXtremeGigabitEthernetDriver_By +tesTotalPERsec, avg(ERWWSEARCH_NetworkInterface_IntelPRO_100NetworkCo +nnection_BytesTotalPERsec) as ERWWSEARCH_NetworkInterface_IntelPRO_10 +0NetworkConnection_BytesTotalPERsec, avg(ERWWSEARCH_NetworkInterface_ +MSTCPLoopbackinterface_BytesTotalPERsec) as ERWWSEARCH_NetworkInterfa +ce_MSTCPLoopbackinterface_BytesTotalPERsec, avg(ERWWSEARCH_NetworkInt +erface_BroadcomNetXtremeGigabitEthernetDriver_CurrentBandwidth) as ER +WWSEARCH_NetworkInterface_BroadcomNetXtremeGigabitEthernetDriver_Curr +entBandwidth, avg(ERWWSEARCH_NetworkInterface_IntelPRO_100NetworkConn +ection_CurrentBandwidth) as ERWWSEARCH_NetworkInterface_IntelPRO_100N +etworkConnection_CurrentBandwidth, avg(ERWWSEARCH_NetworkInterface_MS +TCPLoopbackinterface_CurrentBandwidth) as ERWWSEARCH_NetworkInterface +_MSTCPLoopbackinterface_CurrentBandwidth, avg(ERWWSEARCH_PhysicalDisk +_0CD_PCTDiskTime) as ERWWSEARCH_PhysicalDisk_0CD_PCTDiskTime, avg(ERW +WSEARCH_PhysicalDisk_Total_PCTDiskTime) as ERWWSEARCH_PhysicalDisk_To +tal_PCTDiskTime, avg(ERWWSEARCH_PhysicalDisk_Total_AvgDiskQueueLength +) as ERWWSEARCH_PhysicalDisk_Total_AvgDiskQueueLength, avg(ERWWSEARCH +_Processor_Total_PCTProcessorTime) as ERWWSEARCH_Processor_Total_PCTP +rocessorTime, avg(ERWWSQLWEB_Memory_PagesPERsec) as ERWWSQLWEB_Memory +_PagesPERsec, avg(ERWWSQLWEB_NetworkInterface_BroadcomNetXtremeGigabi +tEthernetDriver_BytesTotalPERsec) as ERWWSQLWEB_NetworkInterface_Broa +dcomNetXtremeGigabitEthernetDriver_BytesTotalPERsec, avg(ERWWSQLWEB_N +etworkInterface_MSTCPLoopbackinterface_BytesTotalPERsec) as ERWWSQLWE +B_NetworkInterface_MSTCPLoopbackinterface_BytesTotalPERsec, avg(ERWWS +QLWEB_NetworkInterface_BroadcomNetXtremeGigabitEthernetDriver_Current +Bandwidth) as ERWWSQLWEB_NetworkInterface_BroadcomNetXtremeGigabitEth +ernetDriver_CurrentBandwidth, avg(ERWWSQLWEB_NetworkInterface_MSTCPLo +opbackinterface_CurrentBandwidth) as ERWWSQLWEB_NetworkInterface_MSTC +PLoopbackinterface_CurrentBandwidth, avg(ERWWSQLWEB_PhysicalDisk_0CD_ +PCTDiskTime) as ERWWSQLWEB_PhysicalDisk_0CD_PCTDiskTime, avg(ERWWSQLW +EB_PhysicalDisk_Total_PCTDiskTime) as ERWWSQLWEB_PhysicalDisk_Total_P +CTDiskTime, avg(ERWWSQLWEB_PhysicalDisk_Total_AvgDiskQueueLength) as +ERWWSQLWEB_PhysicalDisk_Total_AvgDiskQueueLength, avg(ERWWSQLWEB_Proc +essor_Total_PCTProcessorTime) as ERWWSQLWEB_Processor_Total_PCTProces +sorTime FROM results
It looks OK to me, but then again, I'm a novice at SQL.
I do a lot of preprocessing to the csv files before I run them through the perl script. I remove spaces, , , (, ), etc... the list above has all column names in it. I'm wondering if this should be looped? These csv files will average about 30 columns and upto 6000 rows. Doing all the average calculations in one command may be a little taxing on my system.

PS. I don't need the data after the average is calculated. Just the column name and the average. Also, I don't need the first column, that can just go away.


Comment on Re^4: Database processing
Download Code
Re^5: Database processing
by davidrw (Prior) on Aug 06, 2005 at 03:19 UTC
    You can tweak the map to be map { "avg($_) as $_\n" } to make it a little more legible, and then browse for errors. Did you try running the SQL directly in your db? What about a portion (e.g. w/just a few columns) of it?

    If you want to exclude columns, simply grep them out of @cols, or to just ditch the first one, do a shift @cols;
    Doing all the average calculations in one command may be a little taxing on my system.
    No, doing all the average calculations in many commands is what will tax your system. If you do it in separate statements, you're making the database go through all 6000 rows 30 different times. But if you do it in one statement, it only needs one pass.
      I've tried the SQL in earlier versions of this script, and I had no luck using an AS statement. When I run the script now I get:
      SQL ERROR: Bad set function before FROM clause. Execution ERROR: No command found!. $VAR1 = undef;
      When I remove the AS statement:
      my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @cols = @{$sth->{NAME}}; shift @cols; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my $avgsHashRef = $dbh->selectrow_hashref($avgSQL);
      I get
      Execution ERROR: Can't use AVG on a string! called from ./runsql3.sql +at 24. DBD::CSV::db selectrow_hashref failed: Attempt to fetch row from a Non +-SELECT statement [for Statement "SELECT avg(ERWWCOMMUNITIESMemoryPag +essec) , avg(ERWWCOMMUNITIESNetworkInterfaceCompaqEthernetFastEthernetAdapter +ModuleBytesTotalPERsec) , avg(ERWWCOMMUNITIESNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalP +ERsec) , avg(ERWWCOMMUNITIESNetworkInterfaceCompaqEthernetFastEthernetAdapter +ModuleCurrentBandwidth) , avg(ERWWCOMMUNITIESNetworkInterfaceMSTCPLoopbackinterfaceCurrentBand +width) , avg(ERWWCOMMUNITIESPhysicalDisk0CPCTDiskTime) , avg(ERWWCOMMUNITIESPhysicalDisk1DPCTDiskTime) , avg(ERWWCOMMUNITIESPhysicalDiskTotalPCTDiskTime) , avg(ERWWCOMMUNITIESPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWCOMMUNITIESProcessorTotalPCTProcessorTime) , avg(ERWWDC1MemoryPagesPERsec) , avg(ERWWDC1NetworkInterfaceIntelPRO100NetworkConnectionBytesTotalPER +sec) , avg(ERWWDC1NetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec) , avg(ERWWDC1NetworkInterfaceIntelPRO100NetworkConnectionCurrentBandwi +dth) , avg(ERWWDC1NetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth) , avg(ERWWDC1PhysicalDisk0CPCTDiskTime) , avg(ERWWDC1PhysicalDiskTotalPCTDiskTime) , avg(ERWWDC1PhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWDC1ProcessorTotalPCTProcessorTime) , avg(ERWWNABMemoryPagesPERsec) , avg(ERWWNABNetworkInterfaceIntel8255xbasedIntegratedFastEthernetByte +sTotalPERsec) , avg(ERWWNABNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec) , avg(ERWWNABNetworkInterfaceIntel8255xbasedIntegratedFastEthernetCurr +entBandwidth) , avg(ERWWNABNetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth) , avg(ERWWNABPhysicalDisk0CPCTDiskTime) , avg(ERWWNABPhysicalDiskTotalPCTDiskTime) , avg(ERWWNABPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWNABProcessorTotalPCTProcessorTime) , avg(ERWWSEARCHMemoryPagesPERsec) , avg(ERWWSEARCHNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +BytesTotalPERsec) , avg(ERWWSEARCHNetworkInterfaceIntelPRO100NetworkConnectionBytesTotal +PERsec) , avg(ERWWSEARCHNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec +) , avg(ERWWSEARCHNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +CurrentBandwidth) , avg(ERWWSEARCHNetworkInterfaceIntelPRO100NetworkConnectionCurrentBan +dwidth) , avg(ERWWSEARCHNetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth +) , avg(ERWWSEARCHPhysicalDisk0CDPCTDiskTime) , avg(ERWWSEARCHPhysicalDiskTotalPCTDiskTime) , avg(ERWWSEARCHPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWSEARCHProcessorTotalPCTProcessorTime) , avg(ERWWSQLWEBMemoryPagesPERsec) , avg(ERWWSQLWEBNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +BytesTotalPERsec) , avg(ERWWSQLWEBNetworkInterfaceMSTCPLoopbackinterfaceBytesTotalPERsec +) , avg(ERWWSQLWEBNetworkInterfaceBroadcomNetXtremeGigabitEthernetDriver +CurrentBandwidth) , avg(ERWWSQLWEBNetworkInterfaceMSTCPLoopbackinterfaceCurrentBandwidth +) , avg(ERWWSQLWEBPhysicalDisk0CDPCTDiskTime) , avg(ERWWSQLWEBPhysicalDiskTotalPCTDiskTime) , avg(ERWWSQLWEBPhysicalDiskTotalAvgDiskQueueLength) , avg(ERWWSQLWEBProcessorTotalPCTProcessorTime) FROM results"] at ./runsql3.sql line 24. $VAR1 = undef;
      Same error if I trim down the sql statement. I have also re-looked at my csv file and the columns data is numeric. I'm stumped!
        Execution ERROR: Can't use AVG on a string! called from ./runsql3.sql at 24.
        Ah. that changes things -- DBD::CSV is treating the values as strings. Read the Data restrictions section of the docs where it says that DBD::CSV doesn't correctly handle data types, in particular NULLs. If you insert integers, it might happen, that fetch returns a string.

        You might want to try DBD::AnyData instead.. Or in the avg() make sure the value is casted to a number so that it's not treated as a string. or finally, you could just select * all the data (6000 rows) and use perl to loop over it once and average everything.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://481428]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2014-12-25 01:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls