http://www.perlmonks.org?node_id=1030917

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: How can one get the correct results of all combinations with bigger input files?
by BrowserUk (Patriarch) on Apr 27, 2013 at 06:19 UTC
    suggest me necessary corrections in the script for getting all possible combinations.

    You cannot.

    Firstly, you cannot with the code you've posted because glob (including bsd_glob) produce all their results internally before handing them back and given the size of your input files, you will run out of memory -- no matter how much you have -- long before all the combinations have been generated.

    But, even if you switched to using a proper iterating generator, that constructed your combinations one at a time, so avoiding memory exhaustion, with the size of your input files and the combinatorial explosion they represent, it would take years (if not decades) for your script to complete.

    The bottom line is, you are going to have to learn that using brute force algorithms with genomic datasets is rarely feasible.

    I request the perl monks to go through the script

    Quite frankly, I think you are taking the piss. This place is not a code writing service.

    Every time you post, you simply re-post the code given to you as a reply to your last question, and "request" other people to make changes that you are apparently too lazy to try and work out yourself. And you seem to have been pursuing this same strategy for all of the 2+ years you've been coming here.

    I don't think this tactic will work for much longer.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hi BrowserUK,

      Thanks for your comments.

      With regards,

Re: How can one get the correct results of all combinations with bigger input files?
by Corion (Patriarch) on Apr 27, 2013 at 08:27 UTC

    You need to first learn and understand why your current approach fails. Combinatorial Explosion and Cartesian Product are most likely good candidates for reading.

    After that, Higher Order Perl explains the "odometer" technique for implementing a way to enumerate combinations without needing much memory. In fact, if you already know how to program, the whole book is quite good reading.

    After that, you should be able to understand Algorithm::Loops and should be able to use it.

      Hi Corion,

      Thanks for the constructive suggesstions. I shall read the material suggested by you.

      With regards,

Re: How can one get the correct results of all combinations with bigger input files?
by davido (Cardinal) on Apr 27, 2013 at 06:26 UTC

    Have you put a pencil and slide rule to calculating how many combinations your glob might be generating with these large files as input? Brute force often fails to scale well.


    Dave

Re: How can one get the correct results of all combinations with bigger input files?
by hdb (Monsignor) on Apr 27, 2013 at 08:10 UTC

    davido's advice leads to the following exercise: write a script that takes a list of file names, counts the lines in each file and calculates the number of possible combinations (without creating the combinations, of course). (Nothing against pencil and slide rule, but this is a Perl community...) Once you have the result, we can continue the discussion.

      Hi hdb,

      As per davido's advice and your suggestions, I have written a script c.pl that takes a list of file names and calculates number of lines in each text file and finally the total number of combinations. I have observed that the number of combinations will be very large i.e. 644972544 with files s1.txt, s2.txt and s3.txt (each with 864 lines). But I have written a line to remove white space and empty lines,if any, at the end of each text file i.e $fh=~s/\s+$//g;. I am not sure whether it works or not.

      Here goes the c.pl;

      #!/usr/bin/perl use strict; use warnings; my $entry; my @a; do { print"\n Press 1 to enter a File or 2 to count Total Combinations: + "; $entry=<STDIN>; chomp ($entry); if ($entry==1) { print"\n\n Enter File Name to count number of LINES (.txt): "; + my $filename = <STDIN>; chomp $filename; open my $fh, "<", $filename or die "Cannot open $filename.\n"; $fh=~s/\s+$//g; # To remove white space & empty # lines after end of each text file my $count = 0; while ( <$fh> ) {$count++;} print"\n Lines in File $filename: $count\n\n"; push @a, $count; } } until ($entry==2); my $product=1; $product *= $_ foreach @a; print"\n\n Total Combinations: $product\n"; exit;

      I have got the following results:

      C:\Users\x\Desktop>c.pl Press 1 to enter a File or 2 to count Total Combinations: 1 Enter File Name to count number of LINES (.txt): s1.txt Lines in File s1.txt: 864 Press 1 to enter a File or 2 to count Total Combinations: 1 Enter File Name to count number of LINES (.txt): s2.txt Lines in File s2.txt: 864 Press 1 to enter a File or 2 to count Total Combinations: 1 Enter File Name to count number of LINES (.txt): s3.txt Lines in File s3.txt: 864 Press 1 to enter a File or 2 to count Total Combinations: 2 Total Combinations: 644972544 C:\Users\x\Desktop>