Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Creating 2D array / Matrix

by kash650 (Novice)
on Mar 18, 2013 at 19:12 UTC ( #1024130=perlquestion: print w/ replies, xml ) Need Help??
kash650 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to read from two text files, the .bed files, and combine them into one file. They have to be arranged in order though, so I can't just append one file to the other. I'm trying to create a 2D array, but I'm obviously doing it wrong.

$infile1 = "test1.bed"; $infile2 = "test2.bed"; $outfile = "merged.bed"; open ("IN1", "<$infile1") or die "Can't open $file: $!\n"; open ("IN2", "<$infile2") or die "Can't open $file: $!\n"; #open ("OUT", "<$outfile") or die "Can't open $file: $!\n"; my @array1; my $row = 0; my $col = 0; while (<IN1>) { chomp; my @info = split("\t"); @array1[$row]=@info; $row++; } print $array1[0][1];

Comment on Creating 2D array / Matrix
Download Code
Re: Creating 2D array / Matrix
by aitap (Deacon) on Mar 18, 2013 at 19:27 UTC
    I'm not quite sure what you want to do with the files, but reading the file in a two-dimensional array can be done this way:
    use warnings; use strict; open my $filehandle, "<", "filename.bed" or die "open: $!\n"; my @data; while (<$filehandle>) { chomp; # remove the line ending character push @data, [ split /\t/ ]; # split the rest and append it to our dat +a }; print $data[0]->[0]; # access the first element of the array # it's a reference, "->" dereferences it # at last, access the first element of the dereferenced array

    Here the [] brackets create an anonymous array reference (not just an array) filled with the return value of split and this reference (which is a scalar value, so it can be stored in the array) is pushed to the array.

    You can always use Data::Dumper to print the contents of your data structure. Read perlreftut and perldsc for more information on references and complex data structures.

    Edit: forgot to chomp
    Sorry if my advice was wrong.
Re: Creating 2D array / Matrix
by McA (Priest) on Mar 18, 2013 at 19:28 UTC

    UPDATE: My solution deleted as the solution of aitap is much nicer and explains the building blocks better. A ++ for the nice explanation.

    McA

Re: Creating 2D array / Matrix
by Cristoforo (Deacon) on Mar 18, 2013 at 19:35 UTC
    This post finds overlapping .bed files, if that helps.

    The link above will find overlapping ranges. If you just want to list both .bed files in sorted order, then the following will do it.

    #!/usr/bin/perl use strict; use warnings; use 5.014; @ARGV = qw/ 148N.txt 162N.txt 174N.txt 175N.txt /; my @data = map {[split]} map {$_->[0]} sort {$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] || $a->[3] +<=> $b->[3]} map {[ $_, /\d+/g]} <>; say "@$_" for @data;
    Ouput from the 4 files (in sorted order):
    C:\Old_Data\perlp>perl t6.pl chr1 10 50 chr1 12 40 chr1 20 45 chr1 25 30 chr1 25 50 chr1 41 45 chr1 48 80 chr1 60 80 chr1 100 500 chr10 10 20

    Update: changed the sort and made simpler and still correct. .bed file definition from here.

Re: Creating 2D array / Matrix
by bioinformatics (Friar) on Mar 18, 2013 at 20:29 UTC
    If you are looking to put them in order, then you are better off using a hash of hashes (with an array :) ). For instance:

    #!/usr/bin/perl use warnings; use strict; my $usage = "merge_bed.pl <input1> <input2> <output>"; my $input_1 = shift or die $usage; my $input_2 = shift or die $usage; my $output = shift or die $usage; open my $in1, "<", "$input_1" or die "Cannot open $input_1: $!\n"; open my $in2, "<", "$input_2" or die "Cannot open $input_2: $!\n"; open my $out, ">", "$output" or die "Cannot open $output: $!\n"; my %bed_files = (); while ( <$in1> ) { chomp; my ($chrom, $start, $end, undef, undef, $strand) = split "\t"; $bed_files{$chrom}{$start}[0] = $end; # you can save multiple valu +es as an array, so both the end and strand and anything else you want $bed_files{$chrom}{$start}[1] = $strand; } while ( <$in2> ) { chomp; my ($chrom, $start, $end, undef, undef, $strand) = split "\t"; $bed_files{$chrom}{$start}[0] = $end; $bed_files{$chrom}{$start}[1] = $strand; } for my $chrom (sort keys %bed_files) { for my $start (sort {$a <=> $b} keys %{$bed_files[$chrom}}) { # print out the results sorted by chromosome (or scaffold) and + start site print $out "$chrom\t$start\t$bed_files{$chrom}{$start}[0]\t$be +d_files{$chrom}{$start}[1]\n"; } } close $in1; close $in2; close $out; exit;


    This assumes that your lists don't have a) some of the same start sites and b) that there are not overlaps. If you want to find overlaps, then you can do the same thing, but have two separate data structures; you can find the overlaps, and then output the unique regions and one copy of the overlapping regions. There are ways to do this using an index (so it's faster and you only make one pass instead of looping though the entire bed file multiple times).

    You could create function for code generating the main data structure, but I left it as is just so it's easier to read and see what I'm doing. Have fun!

    EDIT: Fixed a couple of typos!

    Bioinformatics
      Can you explain the $usage and "shift or die" part?
        shift here shifts from the @ARGV array. If it fails, (if not enough file names, 3, were supplied on the command line), then the program dies, (quits), and prints the string contained in $usage, ("merge_bed.pl <input1> <input2> <output>". (This message tells the user that his perl program, merge_bed.pl or whatever name you choose as your program name requires 2 input filenames and 1 output name).

        Can you explain the $usage and "shift or die" part?

        If you employ Basic debugging checklist (deparse,print) you can figure it out pretty quick :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1024130]
Approved by aitap
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2014-12-26 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (171 votes), past polls