L_WC has asked for the wisdom of the Perl Monks concerning the following question:
It can't be wrote in threads in different worksheet, like " $workbookNodeList->sheets()-> 2->write(0,0, 'j8', $formatNodeList;" it does not work. could anyone help me? How to write worksheets on same time by threads? thanks.
the data total is about 60,000,000, and need write to 3 sheets from DB, maybe i can say like this the 3 tables in DB will import 3 worksheets in excel, and reduce time, so i think do this when the same time by these 3 worksheets on the same workbook.
my $nodeList = [
['AMS', 'a'],
['APJ', 'ap'],
['EMEA', 'e'],
];
my $workbookNodeList = Excel::Writer::XLSX->new("Node_List.xlsx");
$workbookNodeList->set_properties(
title => 'Node List',
author => 'L_WC demo',
comments => 'Node List',
);
my $formatNodeList = $workbookNodeList->add_format(bg_color => 44);
$formatNodeList->set_align('center');
my $worksheetNodeList = $workbookNodeList->add_worksheet('AMS');
my $worksheetNodeList1 = $workbookNodeList->add_worksheet('APJ');
my $worksheetNodeList2 = $workbookNodeList->add_worksheet('EMEA');
my @arr = ($nodeList->[0], $nodeList->[1], $nodeList->[2]);
my @ar = ('fff');
foreach (@arr) {
$_ = threads->new(\&doSomething, $_->[0], $_->[1], $workbookNodeLis
+t, $formatNodeList);
}
foreach(@arr){
$_->join();
}
$workbookNodeList->close();
print "Node List is Done.";
sub doSomething{
my ($region, $sql, $workbookNodeList, $formatNodeList) = @_;
if($region eq 'AMS'){
[$workbookNodeList->sheets()]->[0]->write(0,0, 'rrrrrr', $formatNo
+deList);
print "AMS -----DONE.\n";
}
if($region eq 'APJ'){
[$workbookNodeList->sheets()]->[1]->write(0,0, '1', $formatNodeLis
+t);
print "APJ -----DONE.\n";
}
if($region eq 'EMEA'){
[$workbookNodeList->sheets()]->[2]->write(0,0, 'j8', $formatNodeLi
+st);
print "EMEA -----DONE.\n";
}
}
Re: How to use threads to write worksheets of excel
by BrowserUk (Patriarch) on Jun 30, 2015 at 09:21 UTC
|
it does not work.
What happens?
| [reply] |
Re: How to use threads to write worksheets of excel
by marioroy (Prior) on Jul 01, 2015 at 01:38 UTC
|
Update: Added missing line after looping each region.
Update: This example runs nearly as fast as serial code and completes in 14.5 seconds without threads.
The following demo demonstrates multiple workers with a single writer running simultaneously. Locking between workers is handled automatically by MCE.
use strict;
use warnings;
use Excel::Writer::XLSX;
use MCE::Loop Sereal => 1;
my $nodeList = [
[ 'AMS' , 'a' ],
[ 'APJ' , 'ap' ],
[ 'EMEA', 'e' ],
];
my ($workbook, %worksheets, $format);
$workbook = Excel::Writer::XLSX->new("Node_List.xlsx");
$workbook->set_properties(
title => 'Node List',
author => 'L_WC demo',
comments => 'Node List',
);
$format = $workbook->add_format(bg_color => 44);
$format->set_align('center');
foreach (@{ $nodeList }) {
$worksheets{ $_->[0] } = $workbook->add_worksheet( $_->[0] );
}
MCE::Loop::init(
chunk_size => 1, max_workers => scalar(@{ $nodeList }),
gather => sub {
my $region = shift;
while (@{ $_[0] }) {
my @args = splice(@{ $_[0] }, 0, 3);
$worksheets{$region}->write(@args, $format)
}
},
);
mce_loop { doSomething($_->[0], $_->[1]) } $nodeList;
$workbook->close();
print "Node List is Done.\n";
sub doSomething {
my ($region, $sql) = @_;
if ($region eq 'AMS') {
my @data; my $n = $region . '_';
for (0..65534) {
push @data, $_, 0, $n . $_;
push @data, $_, 1, $_ + 4;
push @data, $_, 2, $_ + 3;
push @data, $_, 3, $_ + 2;
push @data, $_, 4, $_ + 1;
if ($_ % 4000 == 0) {
MCE->gather($region, \@data);
@data = ();
}
}
MCE->gather($region, \@data) if @data;
print "AMS -----DONE.\n";
}
elsif ($region eq 'APJ') {
my @data; my $n = $region . '_';
for (0..65534) {
push @data, $_, 0, $n . $_;
push @data, $_, 1, $_ + 1;
push @data, $_, 2, $_ + 3;
push @data, $_, 3, $_ + 4;
push @data, $_, 4, $_ + 2;
if ($_ % 4000 == 0) {
MCE->gather($region, \@data);
@data = ();
}
}
MCE->gather($region, \@data) if @data;
print "APJ -----DONE.\n";
}
elsif ($region eq 'EMEA') {
my @data; my $n = $region . '_';
for (0..65534) {
push @data, $_, 0, $n . $_;
push @data, $_, 1, $_ + 1;
push @data, $_, 2, $_ + 2;
push @data, $_, 3, $_ + 3;
push @data, $_, 4, $_ + 4;
if ($_ % 4000 == 0) {
MCE->gather($region, \@data);
@data = ();
}
}
MCE->gather($region, \@data) if @data;
print "EMEA -----DONE.\n";
}
return;
}
Kind regards, Mario | [reply] [d/l] |
|
hi, Mario
thanks for your replay. And i will check out your code. and i have update my question. could you give me more ideas about database base's tables import to worksheets and performance? thanks very much.
| [reply] |
|
| [reply] |
|
Re: How to use threads to write worksheets of excel
by marioroy (Prior) on Jul 01, 2015 at 09:22 UTC
|
July 22, 2015. The example was updated to work with MCE in trunk.
Update: This example runs slower than serial code utilizing 1 core. It was an interesting experiment and learned a lot from it. The serial time takes 14.5 seconds.
Update: Finalized caching/sharing bits. The threads is only needed here to not have workers call DESTROY inside Workbook.pm. I will add a new option ( posix_exit => 1 ) to MCE.
The following is a first attempt at writing to multiple worksheets simultaneously. It works for text and numbers and possibly dates. This is all the time I have but wanted to share anyway. I'm not sure if this will falter for millions of rows.
The following takes 16.541 seconds on my laptop to write about 1 million cells. The time for Numbers on my Mac is about 18 seconds to open the file.
MCE::Shared will be described at a later date after MCE 1.7 is released.
use strict;
use warnings;
# --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
package StrTable;
# String table object for sharing between workers
sub new {
my ($class, $self) = (shift, { table => {}, unique => 0 });
bless $self, $class;
}
sub table {
return $_[0]->{'table'};
}
sub unique {
return $_[0]->{'unique'};
}
sub value {
if (exists $_[0]->{'table'}->{ $_[1] }) {
$_[0]->{'table'}->{ $_[1] };
} else {
$_[0]->{'table'}->{ $_[1] } = $_[0]->{'unique'}++;
}
}
# --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
package main;
# This requires MCE 1.699 in trunk as MCE 1.700 is not yet released.
# The Sereal module may boost freezing/thawing if available.
use Excel::Writer::XLSX;
use MCE::Loop 1.699 Sereal => 1;
use MCE::Shared;
my $str_table = mce_share( new StrTable );
# override Excel::Writer::XLSX::Worksheet::_get_shared_string_index
{
no warnings 'redefine';
sub Excel::Writer::XLSX::Worksheet::_get_shared_string_index {
my $self = shift;
my $str = shift;
if ( not exists ${ $self->{_str_cache} }->{$str} ) {
${ $self->{_str_cache} }->{$str} = $str_table->value($str);
} else {
${ $self->{_str_cache} }->{$str};
}
}
}
# --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
my ($nodeList, $workbook, %worksheets, $format);
$nodeList = [
[ 'AMS' , 'a' ],
[ 'APJ' , 'ap' ],
[ 'EMEA', 'e' ],
];
$workbook = Excel::Writer::XLSX->new("Node_List.xlsx");
$workbook->set_properties(
title => 'Node List',
author => 'L_WC demo',
comments => 'Node List',
);
$format = $workbook->add_format(bg_color => 44);
$format->set_align('center');
foreach (@{ $nodeList }) {
$worksheets{ $_->[0] } = $workbook->add_worksheet( $_->[0] );
}
MCE::Loop::init(
chunk_size => 1, max_workers => scalar(@{ $nodeList }),
posix_exit => 1,
gather => sub {
my $ws = $workbook->{_worksheets};
# replace worksheet; link _str_total/_str_unique to workbook
$ws->[ $_[0] ] = $_[1];
$ws->[ $_[0] ]->{_str_total} = \$workbook->{_str_total};
$ws->[ $_[0] ]->{_str_unique} = \$workbook->{_str_unique};
},
);
# run parallel via MCE
mce_loop { doSomething($_->[0], $_->[1]) } $nodeList;
# replace _str_table; update _str_unique/_str_total
$workbook->{_str_table} = $str_table->table();
$workbook->{_str_unique} = $str_table->unique();
$workbook->{_str_total} = $str_table->unique();
$workbook->close();
print "Node List is Done.\n";
# --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
sub doSomething {
my ($region, $sql) = @_;
my $worksheet = $worksheets{$region};
if ($region eq 'AMS') {
my $n = $region . '_';
for (0..65534) {
$worksheet->write($_, 0, $n . $_);
$worksheet->write($_, 1, $_ + 4 );
$worksheet->write($_, 2, $_ + 3 );
$worksheet->write($_, 3, $_ + 2 );
$worksheet->write($_, 4, $_ + 1 );
}
# delete cache; send worksheet
delete $workbook->{_worksheets}->[0]->{_str_cache};
MCE->gather(0, $workbook->{_worksheets}->[0]);
print "AMS -----DONE.\n";
}
elsif ($region eq 'APJ') {
my $n = $region . '_';
for (0..65534) {
$worksheet->write($_, 0, $n . $_);
$worksheet->write($_, 1, $_ + 1 );
$worksheet->write($_, 2, $_ + 3 );
$worksheet->write($_, 3, $_ + 4 );
$worksheet->write($_, 4, $_ + 2 );
}
# delete cache; send worksheet
delete $workbook->{_worksheets}->[1]->{_str_cache};
MCE->gather(1, $workbook->{_worksheets}->[1]);
print "APJ -----DONE.\n";
}
elsif ($region eq 'EMEA') {
my $n = $region . '_';
for (0..65534) {
$worksheet->write($_, 0, $n . $_);
$worksheet->write($_, 1, $_ + 1 );
$worksheet->write($_, 2, $_ + 2 );
$worksheet->write($_, 3, $_ + 3 );
$worksheet->write($_, 4, $_ + 4 );
}
# delete cache; send worksheet
delete $workbook->{_worksheets}->[2]->{_str_cache};
MCE->gather(2, $workbook->{_worksheets}->[2]);
print "EMEA -----DONE.\n";
}
return;
}
| [reply] [d/l] |
Re: How to use threads to write worksheets of excel
by Anonymous Monk on Jun 30, 2015 at 09:09 UTC
|
for what purpose? What are you hoping to achieve with addition of threads? | [reply] |
|
it's about 60,000,000 data need write 3 sheets.
| [reply] |
|
#! perl -slw
use strict;
use threads;
use Thread::Queue;
use Excel::Writer::XLSX;;
sub worker {
my( $Q, $region, $sql ) = @_;
my $wb = Excel::Writer::XLSX->new("$region.xlsx");
$wb->set_properties( title => 'Node List', author => 'L_WC de
+mo', comments => 'Node List' );
$wb->set_optimization();
my $fmt = $wb->add_format( bg_color => 44 ); $fmt->set_align('cent
+er');
my $ws = $wb->add_worksheet( $region );
my $row = 0;
while( $Q->dequeue ) {
$ws->write( $row++, 0, $_ );
}
$wb->close;
}
my @nodeList = ( ['AMS', 'a'], ['APJ', 'ap'], ['EMEA', 'e'] );
my @Qs = map Thread::Queue->new, 1 .. 3;
my @threads = map threads->new( \&worker, $Qs[ $_ ], @{ $nodeList[ $_
+] } ), 0 .. 2;
for( 1 .. 1e6 ) {
my $sheet = int( rand 3 );
$Qs[ $sheet ]->enqueue( $nodeList[ $sheet ][1] . $_ );
}
$Qs[ $_ ]->enqueue( undef ) for 0 .. 2;
$_->join for @threads;
That writes 1 million items split between 3 workbooks in around 1 minute. Once the 3 files are produced, it should be simple to merge the single sheets from the 3 zipfiles into one workbook by using Excel itself. (I don't have Excel, so I couldn't test that hypothesis.)
| [reply] [d/l] |
|
|
Re: How to use threads to write worksheets of excel
by marioroy (Prior) on Jul 05, 2015 at 03:37 UTC
|
I have tried again and now able to write 1 million cells in less than 6 seconds. In addition, have benchmarked writing 10 million cells which takes 57 seconds on my 2.6 GHz laptop. For comparison, running serially takes 15 and 141 seconds for 1 and 10 million writes respectively.
I was curious and wanted to try again. Thus passing serial code with the same memory consumption, not 3x.
Kind regards, Mario
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|