http://www.perlmonks.org?node_id=998690


in reply to Re^2: Sort CSV file within Excel based on specific column
in thread Sort CSV file within Excel based on specific column

With Text::CSV_XS, you would do (roughly):

  1. Read the first line via Text::CSV_XS, ignore it, and write your new first line (with all the adjusted column names) to your output file, via a second Text::CSV_XS object (I think).
  2. Loop:
    1. Read next line via Text::CSV_XS.
    2. Discard "Called Number" (see splice). e.g., splice @row, 3, 1;
    3. Discard other three columns (more splice). (I can't tell you how to do this, I don't know which columns to remove - if they're all one after another, this could be one call, or if they're all separate, it may be multiple calls. If they're the three after called number, you could combine this with the above by splice @row, 3, 4;, but I don't expect that to be the case)
    4. Manipulate Call Start Time ($row[1]). Probably extract what you need via a regex or two. Use splice to put them back in place: splice @row, 1, 1, @new_call_start_time_columns; (this assumes you want the three new columns to be in the same place as the one old column)
    5. Write out the new @row to the output file via the second Text::CSV_XS object.
  3. ???
  4. Profit!
This produces the output in the same order as the input, which is definitely the easiest. Alternatively, instead of writing out the new @row, just save it to another array, push @all_rows, \@row; (which means you must declare my @row inside the loop, not outside), and when you're done, sort them and then loop through that to spit everything out to your output file.

I also recommend that, if possible, and it isn't always possible, you have your input files and output files in separate directories. Makes it easier to wipe out all of the output files if there's a coding error and you want to modify and re-run.

A second option is to use DBD::CSV as I said earlier. The challenge here is that you will be both reading and creating CSV files through a database interface. Definitely possible, but probably a bit more work to set up. As I mentioned in the linked-to article, I like this solution because it makes me think in SQL, where the "S" means "Structured". A side effect is that, as mentioned earlier, you can just ORDER BY on the initial query and let SQL::Statement and friends handle all the heavy work for you, and you can just deal with it at the other end.