http://www.perlmonks.org?node_id=11152006

MoodyDreams999 has asked for the wisdom of the Perl Monks concerning the following question:

What's the best way to delete a column if you don't have keywords you are looking for? I have a spreadsheet I parse very often it has 0-35 columns and I want to automate this process so it only writes 0-9 columns. (They aren't in order) I tried messing with $col_num, I learned that's not what I need. I need to also sort by date, but thats not my main concern right now. I've adapted some of my excel sub routines that use Spreadsheet::ParseXLSX and Excel::Writer::XLSX and I've searched many forms related to this issue and they've had different format/problems and approaches to those problems especially since alot of them used WIN::OLE a module I'm not very familiar with. I just didnt want to write something completely from scratch with something i'm not used to if the modules I am familiar with will work. Please guide me wise monks.. I have a couple subroutine I could post too. The main one is one ive used on a few different excel projects, I've managed to narrow it down to 60 lines functionally, I realize that may not be ideal as far as SCCE.
use Spreadsheet::ParseXLSX; use Excel::Writer::XLSX; my $in_file = "Input.xlsx"; my $out_file = "Output.xlsx"; # Read in all rows from existing file my $rows = read_excel($in_file); my $count = 0; # Write the updated rows to new file my $col_num = 2; write_excel($out_file, $rows, $col_num); ++$count; sub read_excel { my ( $file, $sheet ) = @_; $sheet ||= 0; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($file); if ( not defined $workbook ) { die $parser->error; } my $worksheet = $workbook->worksheet($sheet); my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); my @rows; for my $row ( $row_min .. $row_max ) { my @cells; for my $col ( $col_min .. $col_max ) { my $cell = $worksheet->get_cell( $row, $col ); if (not $cell) { push(@cells,''); next; } my $value = $cell->value(); push(@cells,$value); } push(@rows,\@cells); } return \@rows; } sub write_excel { my ( $file, $rows, $col_max ) = @_; my $workbook = Excel::Writer::XLSX->new( $file ); if ( not defined $workbook ) { die "Could not open file: $!"; } my $worksheet = $workbook->add_worksheet(); my $row_num = 0; for my $row ( @$rows ) { for my $col (0 .. $col_max) { $worksheet->write( $row_num, $col, $row->[$col] ); } $row_num++; } $workbook->close(); return; }
Input:
program_name enrolled_date_cst Prospect Name Salesperson Client Name Street Address City State Zip Code program_status Termination Date Current Program Length Total Debt Enrolled Verified Debt Amount Unverified Debt Amount Original Debt Without Conditional Debt payment_frequency Original scheduled first draft date First Deposit Date Change Flag First Draft Date First Draft Status Second Draft Date Second Draft Status Third Draft Date Third Draft Status Fourth Draft Date Fourth Draft Status Most Recent Draft Failure Reason Bank Account Status Received in Program Loan Flag OBC Completed Within 2 Hrs Flag Service Entity Name Client Deposit Flag Client Deposit Month
0 BRP-255762 2023-03-21 00Q5d00002ECrhKEAT Tammy Smith Larry Buss Charlestown IN 47111 Enrolled 56 46734 46734 0 46734 Monthly 2023-03-31 N 2023-03-31 Completed 2023-04-15 Completed 2023-05-14 Scheduled 2023-06-14 Scheduled VERIFIED FALSE Y Beyond Finance Yes 2023-03-31
1 BRP-281722 2023-04-25 00Q5d00002FBoa6EAD Dan Tucker Linda Smith Morehead KY 40351 Enrolled 55 42724 42724 0 29439 Monthly 2023-04-27 N 2023-04-27 Completed 2023-05-26 Scheduled 2023-06-26 Scheduled 2023-07-26 Scheduled AUTO_VERIFIED FALSE Y Beyond Finance Yes 2023-04-30
output:
program_name enrolled_date_cst Salesperson Client Name Total Debt Enrolled Verified Debt Amount Unverified Debt Amount First Draft Date First Draft status
26 BRP-238848 2023-02-24 Tami Smith MORRIS CLEGG 18461 10315 8146 2023-02-28 Returned
44 BRP-240923 2023-02-28 Tami Smith Paul Brannon 22162 22162 0 2023-03-07 Completed

Replies are listed 'Best First'.
Re: Deleting a excel column
by atcroft (Abbot) on May 06, 2023 at 00:21 UTC

    I looked at the docs for Spreadsheet::ParseXLSX (and Spreadsheet::ParseExcel) and Excel::Writer::XLSX, and found several functions that can help (especially if you don't mind a bare-bones approach). Assuming you create $workbook_i by parsing the import file into an S:P object and a $workbook_o when creating the E:W:X object, $workbook_i provides you with:

    • $workbook_i->worksheets() method to get the worksheets present in the input file (which you can loop through with, for example, $worksheet_i)
    • $worksheet_i->get_name() which returns the name of the worksheet you are working with
    • $worksheet_i->row_range() and $worksheet_i->col_range(), which give you the min and max row and column indexes, respectively
    • $worksheet_i->get_cell() which can return a Cell object (eg. $cell) on which you can call $cell->unformatted() to the cell value
    • $workbook_o->add_worksheet() which creates a new worksheet in the E:W:X object (eg., $worksheet_o)
    • $worksheet_o->write() which writes to a particular cell of the worksheet

    With the functions above, I wrote a crude filter in 84 lines. (I'd love to see if someone else has a better approach, by the way!)

    It suffers from a number of limitations, including that it applies the list of values to keep to any worksheet present in the workbook (probably not what you want, but this was just a proof of concept) and it leaves blanks for excluded columns (again, just a proof of concept).

    Hope that helps. (Look forward to if others have better code to do the same.)

    Update: 2023-05-06

    Sample command
    ./test.pl --input input.xlsx --output.xlsx

    Sample input:

    Sample output:

      wow Thanks for that, that is definitely a great reference for what I'm trying to do, I appreciate it the proof of conecept.
Re: Deleting a excel column
by NERDVANA (Deacon) on May 10, 2023 at 00:25 UTC
    Since you said the columns aren't always the same or in order, I'd recommend Data::TableReader to do the work of extracting the ones you care about, and then you can use a variation of your write_excel to write them by name.

    Also take a look at Data::Table which makes writing and reading very convenient.