Beefy Boxes and Bandwidth Generously Provided by pair Networks Russ
"be consistent"
 
PerlMonks  

Re: Use of Uninitialized in Concatenation or String Error?

by 2teez (Priest)
on Aug 09, 2013 at 02:32 UTC ( #1048674=note: print w/ replies, xml ) Need Help??


in reply to Use of Uninitialized in Concatenation or String Error?

Hi ccelt09,
With the description of what you want to achieve, I suppose it all comes down to sorting your data with respect to the 5th column and then printing those out in different files ( correct me please if am wrong ) with respect to the range of the same column ( being between 1 and 1000,000,.. etc, 1 inclusive. Making the range 1000,000).
So, if assumption of what you wanted to do is correct, modifying Schwartzian transform a bit should work like so:

use warnings; use strict; use Data::Dumper; push my @array, map { [ int( $_->[1] / 1_000_000 ), $_->[0] ] } sort { $a->[1] <=> $b->[1] } map { [ $_, ( split /\s+/, $_ )[4] ] } <DATA>; print Dumper \@array; __DATA__ 0 50 4 46 723430 0 2 1 2 1 1 1 1 + 3 1 0 50 4 46 5533723430 0 2 1 2 1 1 1 + 1 3 1 0 50 4 46 33723430 0 2 1 2 1 1 1 1 + 3 1 0 50 2 48 654732 0 1 1 1 0 2 3 2 + 1 3
Produces ...
$VAR1 = [ [ 0, '0 50 2 48 654732 0 1 1 1 0 +2 3 2 1 3 ' ], [ 0, '0 50 4 46 723430 0 2 1 2 1 +1 1 1 3 1 ' ], [ 33, '0 50 4 46 33723430 0 2 1 2 1 + 1 1 1 3 1 ' ], [ 5533, '0 50 4 46 5533723430 0 2 1 2 1 + 1 1 1 3 1 ' ] ];
So, printing to different files is just a "function" of placement. The first element in the Array of Array being the file to save to. BUT have this in mind that I don't know how large your data is.
Am using data as posted by Loops and in fact his solution might be better.

If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me


Comment on Re: Use of Uninitialized in Concatenation or String Error?
Select or Download Code
Re^2: Use of Uninitialized in Concatenation or String Error?
by Loops (Hermit) on Aug 09, 2013 at 02:40 UTC
    Hi 2teez,

    If you ripped out my use of DBI and put your code in place it would be a better solution. The SQL stuff is really overkill for this use, but I had all the boilerplate sitting in front of me so reused it for a quick post... Thanks for posting a saner idea.

Re^2: Use of Uninitialized in Concatenation or String Error?
by ccelt09 (Sexton) on Aug 09, 2013 at 07:12 UTC

    Thank you for the consideration, I truly appreciate it! My data spans 155,000,000 nucleotides. In my original code I opened the file: INTERVALS which stores the windows of varying sizes I'd like to use for sorting,

    chrX 1 1000000 chrX 1000001 2000001 chrX 2000001 3000001 ...etc.
    the largest being 1Mb (1,000,000) and assigned it to a scalar variable $interval. I chomp this scalar, assign an array to the split function of that $interval scalar and establish $start and $end variables by assigning them to the 1 and 2 array slices respectively. I don't wish to hard code my range because I have multiple windows to work with
    open (INTERVAL, "/Users/logancurtis-whitchurch/Dropbox/thesis_folder/g +alaxy_chrX_data/chrX_1Mbwindow_nonoverlapping.interval") or die "can' +t open file\n"; while (my $interval = <INTERVAL>){ chomp($interval); my @find_interval = split(/\t/, $interval); my $start = $find_interval[1]; my $end = $find_interval[2];
    from here i used an arbitrary switch variable to control when printing to a given output file should stop and a new file should begin to be printed to.

    My condition, while $switch == 1 I open my data input file, specify and open my output file

    my $switch = 1; while ($switch == 1) { open (CG, "/Users/logancurtis-whitchurch/Dropbox/thesis_folder +/CompleteGenomics/28_males_inAll/CGS.inall.28.chr.23.txt") or die "ca +n't open CG file\n"; my $output_file = "/Users/logancurtis-whitchurch/Desktop/temp_ +$count.txt"; open(OUT, ">$output_file");

    Then with these 3 lines I create an array for the whole input file (@SNPs), make an array from each line or string in the input file (@get_SNP) and create a variable accounting for position ($position) that increments as the data is read via my placeholder variable ($placeholder)

    my @SNPs = <CG>; my @get_SNP = split(/\t/, $SNPs[$placeholder]); my $position = $get_SNP[3];

    I then use an if statement to say if my position in lt or eq to the end and greater than the start, print the $SNP[$placeholder] string corresponding to one data line, then increment $placeholder value, repeating the loop until the if statement is false. Then state else set $switch = 0 ending the while loop. Once this is done I increment a global variable $count that tells the open function to create a new output file since i interpolated the variable $count into the output file name earlier

    my $switch = 1; while ($switch == 1) { if (($position < $end) && ($position >= $start)) { print OUT "$SNPs[$placeholder]\n"; $placeholder++; } else { $switch = 0; $count++; } } }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1048674]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (17)
As of 2014-04-16 15:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (432 votes), past polls