Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^2: Process and combine two CSV files into one

by DrAxeman (Scribe)
on Aug 09, 2005 at 21:51 UTC ( #482443=note: print w/replies, xml ) Need Help??

in reply to Re: Process and combine two CSV files into one
in thread Process and combine two CSV files into one

I'm just not getting this join right. If I do a
my @row = $dbh->selectall_arrayref("SELECT IP, ServerName, Domain, Day +sUptime, OS, RAM, OSSP, InstallDate, CPUSpeed, CPUCount, CPUType FROM + hosts LEFT JOIN info ON hosts.IP = info.IP");
My 2 tables get joined. The only issue is that I get a warning:
Execution ERROR: Ambiguous column name 'IP' called from /usr/lib/perl5 +/vendor_perl/5.8.6/i586-linux-thread-multi/ at 1557.

But if I try and change a column name, every field returned contains the "IP" value, regardless of what the true value should be.

Also, when I try to join a 3rd table it takes a long time. The 2 tables are done in less than 15 seconds. When I add the 3rd table it's just over 5 minutes.
my @row = $dbh->selectall_arrayref("SELECT IP, ServerName, Domain, Day +sUptime, OS, RAM, OSSP, InstallDate, CPUSpeed, CPUCount, CPUType, Par +titionFree FROM hosts, disks LEFT JOIN info ON hosts.IP = info.IP AND + hosts.IP = disks.IP");
In the end, I want ALL data from hosts, and the relevant data from the other tables.
Once I have the these 3 tables combined, I'd like to print them out in a CSV type format. How do I do that?

Replies are listed 'Best First'.
Re^3: Process and combine two CSV files into one
by SimonClinch (Deacon) on Aug 15, 2005 at 08:02 UTC
    The ambiguity error is because the IP just after the SELECT needs to be qualified by its table name (or alternatively could be by an alias name had you used aliases) - just put the table name follwed by a dot in front of that IP.

    Changing the column name (all other things being equal) simply forced it to interpret the same as a literal - you want to undo that change.

    It is normal for three tables to take much longer than two to join. The performance strategy for joining tables 1..n where n>2 is as follows:-

    - Join the first two tables placing the required columns in a temporary table.

    - Then join the third table with the temporary table, putting the results in a second temporary table and drop the first temporary table.

    - Continue this iterative process of joining a results temporary table with the next real table until joining the last real table with the last temporary table at which point the final results can be obtained directly instead of storing in a temporary table - this way no more than two tables are physically joined at once, whereas any number of tables have been logically joined.

    - If this query is intended to be re-used it should be placed in a stored procedure, not inside perl code, to prevent unnecessary communications overheads during execution, especially now that it has been split up. For this reason, ideally in terms of performance as well as other considerations, any re-used process of more than one SQL statement should be placed inside a stored procedure.

    Hope this helps!


    One world, one people

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://482443]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2018-06-23 10:53 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.