Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: part - split up files according to column value

by Corion (Patriarch)
on Jul 14, 2009 at 12:30 UTC ( [id://779901]=note: print w/replies, xml ) Need Help??


in reply to part - split up files according to column value

As I constantly mislay the version of the program that reads and writes Excel files, here it is:

#!/usr/bin/perl -w use strict; use Getopt::Long; use vars qw($VERSION); use Spreadsheet::WriteExcel; use Spreadsheet::ParseExcel; use File::Glob qw(bsd_glob); use ExcelTools; $VERSION = '0.05'; # Try to load Pod::Usage and install a fallback if it doesn't exist eval { require Pod::Usage; Pod::Usage->import(); 1; } or do { *pod2usage = sub { die "Error in command line.\n"; }; }; GetOptions( 'out=s' => \my $tmpl, 'column=i' => \my $col, 'verbose' => \my $verbose, 'header-line:s' => \my $header, 'help' => \my $help, 'version' => \my $version, ) or pod2usage(2); pod2usage(1) if $help; if (defined $version) { print "$VERSION\n"; exit 0; }; pod2usage("$0: No files given.") if ((@ARGV == 0) && (-t STDIN)); if (! defined $tmpl) { # Let's hope we can guess from the first filename ($tmpl = $ARGV[0] || 'part.xls') =~ s/\.(\w+)$/-%s.$1/; }; $col ||= 0; $header ||= 0; my $header_cols; @ARGV = map { bsd_glob $_ } @ARGV; my %lines; for my $file (@ARGV) { my $wb = Spreadsheet::ParseExcel::Workbook->Parse($file); my $data = ExcelTools::sheet_data($wb->{Worksheet}->[0]); $header_cols = splice @$data, 0, $header; for my $c (@$data) { $lines{ $c->[$col]} ||= []; push @{ $lines{$c->[$col]} }, $c }; }; for my $key (sort keys %lines) { (my $clean = $key) =~ s/\s+$//ms; my $name = sprintf $tmpl, $clean; my $out = Spreadsheet::WriteExcel->new($name); my $sheet = $out->add_worksheet('Kandidaten'); unshift @{$lines{ $key }}, $header_cols if $header; $sheet->write_col( 'A1', $lines{ $key }); print "$name\n" #$filename_sep" if $verbose; }; __END__ =head1 NAME xlpart - split up an Excel file into multiple files according to a col +umn value =head1 SYNOPSIS part [OPTIONS] FILES =head1 OPTIONS =item B<--version> - print program version Outputs the program version. =item B<--help> - print this page Outputs this help text. =item B<--out> - set the output template If the output template is not given it is guessed from the name of the first input file or set to C<part-%s.txt>. The C<%s> will be replaced by the column value. =item B<--column> - set the column to part on This is the zero-based number of the column. =item B<--header-line> - output the first line into every file This defines the line as header line which is output into every file. If it is given an argument that string is output as header, otherwise the first line read will be repeated as the header. If the value is a number, that many lines will be read from the file and used as the header. This makes it impossible to use just a number as the header. =item B<--verbose> - output the generated filenames In normal operation, the program will be silent. If you need to know the generated filenames, the B<--verbose> option will output them. =head1 CAVEAT The program loads the whole input into RAM before writing the output. A future enhancement might be a C<uniq>-like option that tells the program to assume that the input will be grouped according to the parted column so it does not need to allocate memory. If your memory is not large enough, the following C<awk> one-liner might help you: # Example of parting on column 3 awk -F '{ print $0 > $3 }' FILE =head1 AUTHOR Max Maischein (C<< corion@cpan.org >>)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://779901]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-25 11:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found