Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: Sort CSV file within Excel based on specific column

by jaacmmason (Novice)
on Oct 12, 2012 at 13:48 UTC ( #998682=note: print w/ replies, xml ) Need Help??


in reply to Re: Sort CSV file within Excel based on specific column
in thread Sort CSV file within Excel based on specific column

My file looks similar to the following:

Item,Call Start Time,Calling Number,Called Number,Call Length,Call Tim +e,Billable,Dept,Time Zone 1,Sat Sep 1 08:13:00 2012,(815)444-4444,(815)626-6262,0.4,N/A,Inbound, +N, HR,GMT-5 2,Sat Sep 1 08:13:00 2012,(815)626-6262,(815)950-0000,0.4,Intragroup,O +utbound,N,HR,GMT-5 3,Sat Sep 1 09:04:00 2012,(224)555-9999,(815)626-6262,4.3,N/A,Inbound, +N,HR,GMT-5 4,Sat Sep 1 09:04:00 2012,(815)626-6262,(815)950-0000,4.3,Intragroup,O +utbound,N,HR,GMT-5 5,Sat Sep 1 09:54:00 2012,(815)441-8383,(815)626-6262,0.5,N/A,Inbound, +N,HR,GMT-5 6,Sat Sep 1 09:54:00 2012,(815)626-6262,(815)950-0000,0.5,Intragroup,O +utbound,N,HR,GMT-5

Then I want to take this data (24 separate files with as many as 7000 lines of data in each file, each month) and remove all the "outbound" Call Direction rows, as this is essentially a duplicate field. This will cut our file in half. At this point I need to do more manipulation in order to break up the Call Start Time into 3 separate fields, and to delete three other columns.

I thought if I could manipulate the entire thing using Perl and the WIN32::OLE options to do this. Can you answer me if this is all possible using Text::CSV, rather than WIN32::OLE? This is the first time I have used TEXT::CSV, so I am not sure if all my needed functionally can be achieved.

Recommendations on the easiest way for me to achieve what my end goal is would be great! I will have to research either way, as I still classify myself as a Perl newbie.


Comment on Re^2: Sort CSV file within Excel based on specific column
Download Code
Re^3: Sort CSV file within Excel based on specific column
by Tanktalus (Canon) on Oct 12, 2012 at 14:17 UTC

    With Text::CSV_XS, you would do (roughly):

    1. Read the first line via Text::CSV_XS, ignore it, and write your new first line (with all the adjusted column names) to your output file, via a second Text::CSV_XS object (I think).
    2. Loop:
      1. Read next line via Text::CSV_XS.
      2. Discard "Called Number" (see splice). e.g., splice @row, 3, 1;
      3. Discard other three columns (more splice). (I can't tell you how to do this, I don't know which columns to remove - if they're all one after another, this could be one call, or if they're all separate, it may be multiple calls. If they're the three after called number, you could combine this with the above by splice @row, 3, 4;, but I don't expect that to be the case)
      4. Manipulate Call Start Time ($row[1]). Probably extract what you need via a regex or two. Use splice to put them back in place: splice @row, 1, 1, @new_call_start_time_columns; (this assumes you want the three new columns to be in the same place as the one old column)
      5. Write out the new @row to the output file via the second Text::CSV_XS object.
    3. ???
    4. Profit!
    This produces the output in the same order as the input, which is definitely the easiest. Alternatively, instead of writing out the new @row, just save it to another array, push @all_rows, \@row; (which means you must declare my @row inside the loop, not outside), and when you're done, sort them and then loop through that to spit everything out to your output file.

    I also recommend that, if possible, and it isn't always possible, you have your input files and output files in separate directories. Makes it easier to wipe out all of the output files if there's a coding error and you want to modify and re-run.

    A second option is to use DBD::CSV as I said earlier. The challenge here is that you will be both reading and creating CSV files through a database interface. Definitely possible, but probably a bit more work to set up. As I mentioned in the linked-to article, I like this solution because it makes me think in SQL, where the "S" means "Structured". A side effect is that, as mentioned earlier, you can just ORDER BY on the initial query and let SQL::Statement and friends handle all the heavy work for you, and you can just deal with it at the other end.

Re^3: Sort CSV file within Excel based on specific column
by Kenosis (Priest) on Oct 12, 2012 at 15:52 UTC

    Tanktalus provided excellent directions on how to achieve your goal. With just a little more work, the script I've shown you can accomplish these items. For example:

    while ( my $row = $csv->getline($csvfh) ) { next if $row->[6] eq 'Outbound'; push @csvLines, $row; }

    will skip all lines containing 'Outbound' in your shown data set. You can manipulate the row data right after next, to get what you need before pushing the line onto @csvLines.

    How do you want to separate Call Start Time and which three columns do you want to delete?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998682]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (21)
As of 2014-07-30 13:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (233 votes), past polls