I looked at the docs for Spreadsheet::ParseXLSX (and Spreadsheet::ParseExcel) and Excel::Writer::XLSX, and found several functions that can help (especially if you don't mind a bare-bones approach). Assuming you create $workbook_i by parsing the import file into an S:P object and a $workbook_o when creating the E:W:X object, $workbook_i provides you with:
- $workbook_i->worksheets() method to get the worksheets present in the input file (which you can loop through with, for example, $worksheet_i)
- $worksheet_i->get_name() which returns the name of the worksheet you are working with
- $worksheet_i->row_range() and $worksheet_i->col_range(), which give you the min and max row and column indexes, respectively
- $worksheet_i->get_cell() which can return a Cell object (eg. $cell) on which you can call $cell->unformatted() to the cell value
- $workbook_o->add_worksheet() which creates a new worksheet in the E:W:X object (eg., $worksheet_o)
- $worksheet_o->write() which writes to a particular cell of the worksheet
With the functions above, I wrote a crude filter in 84 lines. (I'd love to see if someone else has a better approach, by the way!)
#!/usr/bin/env perl
use strict;
use warnings;
use Excel::Writer::XLSX;
use Getopt::Long;
use Spreadsheet::ParseXLSX;
my @desired = ( qw/ qux foobar fubar /, );
my $infile = undef;
my $outfile = undef;
{
my @temp_desired = ();
GetOptions(
q{keep:s} => \@temp_desired,
q{input:s} => \$infile,
q{output:s} => \$outfile,
q{help} => sub { &help; },
);
if ( scalar @temp_desired ) {
@desired = ( @temp_desired, );
}
&help
unless ( ( defined $infile and -e $infile )
and ( defined $outfile ) );
}
my $parser_i = Spreadsheet::ParseXLSX->new();
my $workbook_i = $parser_i->parse($infile)
or die qq{$infile}, q{: }, $parser_i->error(), qq{.\n};
my $workbook_o = Excel::Writer::XLSX->new($outfile)
or die qq{$outfile}, q{: },
qq{Problem creating new Excel file: }, $!, qq{\n};
foreach my $worksheet_i ( $workbook_i->worksheets() ) {
my $worksheet_o = $workbook_o->add_worksheet(
$worksheet_i->get_name() );
my %ignore = ();
my @row_range = $worksheet_i->row_range();
foreach my $i ( $row_range[0] .. $row_range[1] ) {
if ( not $i ) {
my @col_range = $worksheet_i->col_range();
foreach my $j ( $col_range[0] .. $col_range[1] )
{
my $cell = $worksheet_i->get_cell( $i, $j );
if ( defined $cell ) {
if (
not scalar
grep { $_ eq $cell->unformatted() }
@desired )
{
$ignore{$j}++;
next;
}
$worksheet_o->write( $i, $j,
$cell->unformatted(), );
}
}
}
else {
my @col_range = $worksheet_i->col_range();
foreach my $j ( $col_range[0] .. $col_range[1] )
{
next if ( defined $ignore{$j} );
my $cell = $worksheet_i->get_cell( $i, $j );
next unless ( defined $cell );
$worksheet_o->write( $i, $j,
$cell->unformatted(), );
}
}
}
}
$workbook_o->close();
sub help {
die
qq{$0 --input file --output file [--keep s1] [--keep s2] ...\n};
}
It suffers from a number of limitations, including that it applies the list of values to keep to any worksheet present in the workbook (probably not what you want, but this was just a proof of concept) and it leaves blanks for excluded columns (again, just a proof of concept).
Hope that helps. (Look forward to if others have better code to do the same.)
Update: 2023-05-06
Sample command
./test.pl --input input.xlsx --output.xlsx
Sample input:
input.xlsx:
Sheet1
bar | qux | foobar | fubar | corge | grault | thud | quux | xyzzy | waldo |
1 | Mrs. White | Kitchen | Candlestick | 440 | 751 | 394 | 311 | 35 | 348 |
2 | Ms. Scarlett | Lounge | Dagger | 750 | 539 | 473 | 652 | 920 | 900 |
3 | Col. Mustard | Library | Revolver | 691 | 422 | 139 | 690 | 67 | 715 |
Sheet2
yabba | dabba | doo | dah |
1 | A | B | C |
2 | D | E | F |
Sample output:
output.xlsx:
Sheet1
| qux | foobar | fubar | | | | | | |
| Mrs. White | Kitchen | Candlestick | | | | | | |
| Ms. Scarlett | Lounge | Dagger | | | | | | |
| Col. Mustard | Library | Revolver | | | | | | |
Sheet2