Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How to use threads to write worksheets of excel

by L_WC (Initiate)
on Jun 30, 2015 at 08:23 UTC ( [id://1132584]=perlquestion: print w/replies, xml ) Need Help??

L_WC has asked for the wisdom of the Perl Monks concerning the following question:

It can't be wrote in threads in different worksheet, like "$workbookNodeList->sheets()->2->write(0,0, 'j8', $formatNodeList;" it does not work. could anyone help me? How to write worksheets on same time by threads? thanks. the data total is about 60,000,000, and need write to 3 sheets from DB, maybe i can say like this the 3 tables in DB will import 3 worksheets in excel, and reduce time, so i think do this when the same time by these 3 worksheets on the same workbook.
my $nodeList = [ ['AMS', 'a'], ['APJ', 'ap'], ['EMEA', 'e'], ]; my $workbookNodeList = Excel::Writer::XLSX->new("Node_List.xlsx"); $workbookNodeList->set_properties( title => 'Node List', author => 'L_WC demo', comments => 'Node List', ); my $formatNodeList = $workbookNodeList->add_format(bg_color => 44); $formatNodeList->set_align('center'); my $worksheetNodeList = $workbookNodeList->add_worksheet('AMS'); my $worksheetNodeList1 = $workbookNodeList->add_worksheet('APJ'); my $worksheetNodeList2 = $workbookNodeList->add_worksheet('EMEA'); my @arr = ($nodeList->[0], $nodeList->[1], $nodeList->[2]); my @ar = ('fff'); foreach (@arr) { $_ = threads->new(\&doSomething, $_->[0], $_->[1], $workbookNodeLis +t, $formatNodeList); } foreach(@arr){ $_->join(); } $workbookNodeList->close(); print "Node List is Done."; sub doSomething{ my ($region, $sql, $workbookNodeList, $formatNodeList) = @_; if($region eq 'AMS'){ [$workbookNodeList->sheets()]->[0]->write(0,0, 'rrrrrr', $formatNo +deList); print "AMS -----DONE.\n"; } if($region eq 'APJ'){ [$workbookNodeList->sheets()]->[1]->write(0,0, '1', $formatNodeLis +t); print "APJ -----DONE.\n"; } if($region eq 'EMEA'){ [$workbookNodeList->sheets()]->[2]->write(0,0, 'j8', $formatNodeLi +st); print "EMEA -----DONE.\n"; } }

Replies are listed 'Best First'.
Re: How to use threads to write worksheets of excel
by BrowserUk (Patriarch) on Jun 30, 2015 at 09:21 UTC
Re: How to use threads to write worksheets of excel
by marioroy (Prior) on Jul 01, 2015 at 01:38 UTC

    Update: Added missing line after looping each region.

    Update: This example runs nearly as fast as serial code and completes in 14.5 seconds without threads.

    The following demo demonstrates multiple workers with a single writer running simultaneously. Locking between workers is handled automatically by MCE.

    use strict; use warnings; use Excel::Writer::XLSX; use MCE::Loop Sereal => 1; my $nodeList = [ [ 'AMS' , 'a' ], [ 'APJ' , 'ap' ], [ 'EMEA', 'e' ], ]; my ($workbook, %worksheets, $format); $workbook = Excel::Writer::XLSX->new("Node_List.xlsx"); $workbook->set_properties( title => 'Node List', author => 'L_WC demo', comments => 'Node List', ); $format = $workbook->add_format(bg_color => 44); $format->set_align('center'); foreach (@{ $nodeList }) { $worksheets{ $_->[0] } = $workbook->add_worksheet( $_->[0] ); } MCE::Loop::init( chunk_size => 1, max_workers => scalar(@{ $nodeList }), gather => sub { my $region = shift; while (@{ $_[0] }) { my @args = splice(@{ $_[0] }, 0, 3); $worksheets{$region}->write(@args, $format) } }, ); mce_loop { doSomething($_->[0], $_->[1]) } $nodeList; $workbook->close(); print "Node List is Done.\n"; sub doSomething { my ($region, $sql) = @_; if ($region eq 'AMS') { my @data; my $n = $region . '_'; for (0..65534) { push @data, $_, 0, $n . $_; push @data, $_, 1, $_ + 4; push @data, $_, 2, $_ + 3; push @data, $_, 3, $_ + 2; push @data, $_, 4, $_ + 1; if ($_ % 4000 == 0) { MCE->gather($region, \@data); @data = (); } } MCE->gather($region, \@data) if @data; print "AMS -----DONE.\n"; } elsif ($region eq 'APJ') { my @data; my $n = $region . '_'; for (0..65534) { push @data, $_, 0, $n . $_; push @data, $_, 1, $_ + 1; push @data, $_, 2, $_ + 3; push @data, $_, 3, $_ + 4; push @data, $_, 4, $_ + 2; if ($_ % 4000 == 0) { MCE->gather($region, \@data); @data = (); } } MCE->gather($region, \@data) if @data; print "APJ -----DONE.\n"; } elsif ($region eq 'EMEA') { my @data; my $n = $region . '_'; for (0..65534) { push @data, $_, 0, $n . $_; push @data, $_, 1, $_ + 1; push @data, $_, 2, $_ + 2; push @data, $_, 3, $_ + 3; push @data, $_, 4, $_ + 4; if ($_ % 4000 == 0) { MCE->gather($region, \@data); @data = (); } } MCE->gather($region, \@data) if @data; print "EMEA -----DONE.\n"; } return; }

    Kind regards, Mario

      hi, Mario thanks for your replay. And i will check out your code. and i have update my question. could you give me more ideas about database base's tables import to worksheets and performance? thanks very much.
Re: How to use threads to write worksheets of excel
by marioroy (Prior) on Jul 01, 2015 at 09:22 UTC

    July 22, 2015. The example was updated to work with MCE in trunk.

    Update: This example runs slower than serial code utilizing 1 core. It was an interesting experiment and learned a lot from it. The serial time takes 14.5 seconds.

    Update: Finalized caching/sharing bits. The threads is only needed here to not have workers call DESTROY inside Workbook.pm. I will add a new option ( posix_exit => 1 ) to MCE.

    The following is a first attempt at writing to multiple worksheets simultaneously. It works for text and numbers and possibly dates. This is all the time I have but wanted to share anyway. I'm not sure if this will falter for millions of rows.

    The following takes 16.541 seconds on my laptop to write about 1 million cells. The time for Numbers on my Mac is about 18 seconds to open the file.

    MCE::Shared will be described at a later date after MCE 1.7 is released.

    use strict; use warnings; # --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- package StrTable; # String table object for sharing between workers sub new { my ($class, $self) = (shift, { table => {}, unique => 0 }); bless $self, $class; } sub table { return $_[0]->{'table'}; } sub unique { return $_[0]->{'unique'}; } sub value { if (exists $_[0]->{'table'}->{ $_[1] }) { $_[0]->{'table'}->{ $_[1] }; } else { $_[0]->{'table'}->{ $_[1] } = $_[0]->{'unique'}++; } } # --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- package main; # This requires MCE 1.699 in trunk as MCE 1.700 is not yet released. # The Sereal module may boost freezing/thawing if available. use Excel::Writer::XLSX; use MCE::Loop 1.699 Sereal => 1; use MCE::Shared; my $str_table = mce_share( new StrTable ); # override Excel::Writer::XLSX::Worksheet::_get_shared_string_index { no warnings 'redefine'; sub Excel::Writer::XLSX::Worksheet::_get_shared_string_index { my $self = shift; my $str = shift; if ( not exists ${ $self->{_str_cache} }->{$str} ) { ${ $self->{_str_cache} }->{$str} = $str_table->value($str); } else { ${ $self->{_str_cache} }->{$str}; } } } # --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- my ($nodeList, $workbook, %worksheets, $format); $nodeList = [ [ 'AMS' , 'a' ], [ 'APJ' , 'ap' ], [ 'EMEA', 'e' ], ]; $workbook = Excel::Writer::XLSX->new("Node_List.xlsx"); $workbook->set_properties( title => 'Node List', author => 'L_WC demo', comments => 'Node List', ); $format = $workbook->add_format(bg_color => 44); $format->set_align('center'); foreach (@{ $nodeList }) { $worksheets{ $_->[0] } = $workbook->add_worksheet( $_->[0] ); } MCE::Loop::init( chunk_size => 1, max_workers => scalar(@{ $nodeList }), posix_exit => 1, gather => sub { my $ws = $workbook->{_worksheets}; # replace worksheet; link _str_total/_str_unique to workbook $ws->[ $_[0] ] = $_[1]; $ws->[ $_[0] ]->{_str_total} = \$workbook->{_str_total}; $ws->[ $_[0] ]->{_str_unique} = \$workbook->{_str_unique}; }, ); # run parallel via MCE mce_loop { doSomething($_->[0], $_->[1]) } $nodeList; # replace _str_table; update _str_unique/_str_total $workbook->{_str_table} = $str_table->table(); $workbook->{_str_unique} = $str_table->unique(); $workbook->{_str_total} = $str_table->unique(); $workbook->close(); print "Node List is Done.\n"; # --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- sub doSomething { my ($region, $sql) = @_; my $worksheet = $worksheets{$region}; if ($region eq 'AMS') { my $n = $region . '_'; for (0..65534) { $worksheet->write($_, 0, $n . $_); $worksheet->write($_, 1, $_ + 4 ); $worksheet->write($_, 2, $_ + 3 ); $worksheet->write($_, 3, $_ + 2 ); $worksheet->write($_, 4, $_ + 1 ); } # delete cache; send worksheet delete $workbook->{_worksheets}->[0]->{_str_cache}; MCE->gather(0, $workbook->{_worksheets}->[0]); print "AMS -----DONE.\n"; } elsif ($region eq 'APJ') { my $n = $region . '_'; for (0..65534) { $worksheet->write($_, 0, $n . $_); $worksheet->write($_, 1, $_ + 1 ); $worksheet->write($_, 2, $_ + 3 ); $worksheet->write($_, 3, $_ + 4 ); $worksheet->write($_, 4, $_ + 2 ); } # delete cache; send worksheet delete $workbook->{_worksheets}->[1]->{_str_cache}; MCE->gather(1, $workbook->{_worksheets}->[1]); print "APJ -----DONE.\n"; } elsif ($region eq 'EMEA') { my $n = $region . '_'; for (0..65534) { $worksheet->write($_, 0, $n . $_); $worksheet->write($_, 1, $_ + 1 ); $worksheet->write($_, 2, $_ + 2 ); $worksheet->write($_, 3, $_ + 3 ); $worksheet->write($_, 4, $_ + 4 ); } # delete cache; send worksheet delete $workbook->{_worksheets}->[2]->{_str_cache}; MCE->gather(2, $workbook->{_worksheets}->[2]); print "EMEA -----DONE.\n"; } return; }
Re: How to use threads to write worksheets of excel
by Anonymous Monk on Jun 30, 2015 at 09:09 UTC
    for what purpose? What are you hoping to achieve with addition of threads?
      it's about 60,000,000 data need write 3 sheets.

        Excel::Writer::XLSX isn't thread-safe, so I'd tackle the problem this way:

        #! perl -slw use strict; use threads; use Thread::Queue; use Excel::Writer::XLSX;; sub worker { my( $Q, $region, $sql ) = @_; my $wb = Excel::Writer::XLSX->new("$region.xlsx"); $wb->set_properties( title => 'Node List', author => 'L_WC de +mo', comments => 'Node List' ); $wb->set_optimization(); my $fmt = $wb->add_format( bg_color => 44 ); $fmt->set_align('cent +er'); my $ws = $wb->add_worksheet( $region ); my $row = 0; while( $Q->dequeue ) { $ws->write( $row++, 0, $_ ); } $wb->close; } my @nodeList = ( ['AMS', 'a'], ['APJ', 'ap'], ['EMEA', 'e'] ); my @Qs = map Thread::Queue->new, 1 .. 3; my @threads = map threads->new( \&worker, $Qs[ $_ ], @{ $nodeList[ $_ +] } ), 0 .. 2; for( 1 .. 1e6 ) { my $sheet = int( rand 3 ); $Qs[ $sheet ]->enqueue( $nodeList[ $sheet ][1] . $_ ); } $Qs[ $_ ]->enqueue( undef ) for 0 .. 2; $_->join for @threads;

        That writes 1 million items split between 3 workbooks in around 1 minute. Once the 3 files are produced, it should be simple to merge the single sheets from the 3 zipfiles into one workbook by using Excel itself. (I don't have Excel, so I couldn't test that hypothesis.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
Re: How to use threads to write worksheets of excel
by marioroy (Prior) on Jul 05, 2015 at 03:37 UTC

    I have tried again and now able to write 1 million cells in less than 6 seconds. In addition, have benchmarked writing 10 million cells which takes 57 seconds on my 2.6 GHz laptop. For comparison, running serially takes 15 and 141 seconds for 1 and 10 million writes respectively.

    I was curious and wanted to try again. Thus passing serial code with the same memory consumption, not 3x.

    Kind regards, Mario

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1132584]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-26 00:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found