Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hi Perl gurus, I met a problem when I tried to apply a multithread perl script. In my script, I need to read two lists of files (for example, list1 has 7 files, and list2 has 5 files) and grab some information into a hash (If there are 12 files, then I create 12 threads to read files and put all the information into the same hash). Each file has ~21 millions lines. So I want to use multithread to speed up my script. However, it seems that the multithread one ran even much slower than the single thread script. I know that there should be some problems in my script, but I don't know where they are. Hope you could help me. Thanks very much. Following is my script:
#!/usr/bin/perl -w use strict; use warnings; use threads; use threads::shared; use Statistics::R; # using R to do t.test if(@ARGV != 3) { print STDERR "Usage: test_mt.pl input_list1 input_list2 output\n"; exit(0); } my ($inf1, $inf2, $outf)=@ARGV; my @inf1=`ls $inf1`; # get all of files in list1 my @inf2=`ls $inf2`; # get all of files in list2 my %hash :shared; sub read_file{ my $inf=shift; $hash{$inf} = &share({}); open(IN, $inf) or die "cannot open $inf\n"; while(<IN>){ if($_=~/\w/){ chomp; my @info=split(/\t/, $_); # lock(%hash); $hash{$inf}{$info[0]."_".$info[1]}=$info[2]; } } close IN; } my @threads; my $thread_count=0; for(my $i=0; $i<=$#inf1; $i++){ my $t = threads->create(\&read_file, $inf1[$i]); push(@threads, $t); $thread_count++; } for(my $i=0; $i<=$#inf2; $i++){ my $t = threads->create(\&read_file, $inf2[$i]); push(@threads, $t); $thread_count++; } print STDERR "Total threads to read the files: $thread_count\n"; $_->join foreach @threads; sleep 1; open(OUT, ">$outf") or die "cannot open $outf\n"; # Create a communication bridge with R and start R my $R = Statistics::R->new(); ### below is to do some R related calculation based on the hash; the +problem should exist above :) open(IN, $inf1[0]) or die "cannot open $inf1[0]\n"; while(<IN>){ if($_=~/\w/){ my @info=split(/\t/, $_); my $cpg_score; my $total1; my $total2; my @list1; my @list2; foreach my $sample (@inf1){ $cpg_score.="\t".$hash{$sample}{$info[0]."_".$info[1]}; $total1+=$hash{$sample}{$info[0]."_".$info[1]}; push(@list1, $hash{$sample}{$info[0]."_".$info[1]}); } foreach my $sample (@inf2){ $cpg_score.="\t".$hash{$sample}{$info[0]."_".$info[1]}; $total2+=$hash{$sample}{$info[0]."_".$info[1]}; push(@list2, $hash{$sample}{$info[0]."_".$info[1]}); } if(abs($total1/@sample1 - $total2/@sample2)>=0.2){ my $mean1=sprintf("%.2f", $total1/@inf1); my $mean2=sprintf("%.2f", $total2/@inf2); my $list1=join",", @list1; my $list2=join",", @list2; ### Run R commands $R->run(qq`x <- t.test(c($list1), c($list2))`); my $p_value= $R -> get('x$p.value'); print OUT "$info[0]\t$info[1]$cpg_score\t$mean1\t$mean2\t$p_ +value\n"; } } } $R->stop(); close IN; close OUT;

In reply to problem of my multithreading perl script by qingfengzealot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-25 13:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found