Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: Comparing and getting information from two large files and appending it in a new file

by perlkhan77 (Acolyte)
on Apr 01, 2012 at 01:32 UTC ( #962819=note: print w/replies, xml ) Need Help??


in reply to Re^3: Comparing and getting information from two large files and appending it in a new file
in thread Comparing and getting information from two large files and appending it in a new file

Hi Graff one more thing about the result file being produced the number of lines having Gm10 assembly in Methylation.gtf are 26637 and thus the final result should have the same number with 0 value for those genes which have no CG CHG or CHH count while right now it only prints 8626 lines including the header. Sorry to trouble you about that but if you can let me know what changes should I make in the code to make it possible

Thanks again

  • Comment on Re^4: Comparing and getting information from two large files and appending it in a new file

Replies are listed 'Best First'.
Re^5: Comparing and getting information from two large files and appending it in a new file
by graff (Chancellor) on Apr 01, 2012 at 02:19 UTC
    To get one line of output for every line in your first input file, there's few changes.
    ... my %methrange; my %methhash; # this line had been further down in my prev.version, j +ust move it up ... if ( /^\s*Gm10/ ) { my ( $bgn, $end, $methcount ) = (split)[3,4,10]; $methrange{$bgn}{$end}{$_} = undef; $methhash{$_}(methcount} = $methcount; # moved up from below } ... for my $end ( keys %{$methrange{$bgn}} ) { if ( $position <= $end ) { for my $match ( keys %{$methrange{$bgn}{$end}} ) { $methhash{$match}{$class}++; # methcount was +moved from here } } } ...
    As for the benchmark, you said that your OP version "took forever". Was "forever" more than an hour and a half? (Did my version yield any improvement at all?) Do you have specific constraints about how much time can be taken up by a single run? If not, I'd say focus more on making sure the output is correct, rather than how long it takes to produce the output.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://962819]
help
Chatterbox?
[LanX]: Then come to YAPC EU in Amsterdam ...
[robby_dobby]: LanX: But that's in Africa! More like YAPC::Africa
choroba fears Amsterdam might fill him with enthusiasm to do nothing
[robby_dobby]: Uff, too expensive :-(
[LanX]: ... I found YAPC EU more entertaining (and international )than YAPC NA
[robby_dobby]: What's the site for YAPC::EU?
[LanX]: Robby Dobby no IIRC Sinai is already part of Asia
[robby_dobby]: blogs.perl.org page for yapceu is out of date by 2 years
[robby_dobby]: LanX: Sharm-el-Sheikh is in Egypt
[choroba]: act.yapc.eu

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (16)
As of 2017-04-24 15:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I'm a fool:











    Results (442 votes). Check out past polls.