Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Compare hash with arrays and print

by ad23 (Acolyte)
on Jul 12, 2010 at 18:18 UTC ( [id://849071]=perlquestion: print w/replies, xml ) Need Help??

ad23 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have created a hash (from a given number of files), which has the following keys : values associated with it:

aw1:10 qs2:20 dd3:30 de4:10 hg5:30 dfd6:20 gf4:20 hgh5:30 hgy3:10 and so on...

There are only 3 values (10,20,30), for this hash. Furthermore, I have a file - "Sample.fa" (with header information) like:

>aw1 ATGCTAGATGCTAGCTAGCTAGCACTGAT CGATGCTAGCGTAGTCAGCTGATGCTGTA CGATGCTAGTCGTACG >qs2 CGAGCTAGTCGTAGTCGTGATGCTGATTA CGATGCTAGTCGTAGCTAGCTGATGCTGC CGATGCTAGTCGTAGTC >dd3 CGTAGTCGTAGTCGTAGTCGATGCTGATG GCTAGTCGATGCTAGCTAGTCGATGCTGG CGATGCTGAT >de4 CGTAGTCGTAGTCGTACGTAGTCGTGAGT CGATTATTTAGGAGGGACAAGGATAGTA >hg5 CGTAGTCGTAGTCTAGTCGTGATGCTAGA >dfd6 CGATGCTACGTACGTAGTCAGTCGTGATG AATTAGAGCAGATAGAGGGGGAAAGGGTT AAACCCC >gf4 CGTAGTCAGTCTAGCTGATGTCGATGCTG >hgh5 CATGCTAGTCGTAGTCGTAGTCGATGCTT TTTTAAGGGAACCCCC >hgy3 CCCCGGGTTTGGGAAAAGGGGGGGGATAG

I want to write the corresponding header and seq from Sample.fa, to 3 files (10.txt, 20.txt, 30.txt -- according to the hash key and value). Following is a snippet from my code :

open (FILE1, ">10.txt"); open (FILE2, ">20.txt"); open (FILE3, ">30.txt"); my @files = glob('Data/*.fa'); my %hash = &readFile(); foreach my $f(@files){ open FILE, $f or die "Cannot open : $!\n"; while (my $line = <FILE>){ chomp $line; #print "$line\n"; if($line =~ /^>/){ my @head = split (/ /, $line); my $name = substr($line, 1); foreach my $key(keys %hash){ if($key eq $name){ if($hash{$key} == 10){ select FILE1; } if($hash{$key} == 20){ select FILE2; } if($hash{$key} == 30){ select FILE3; } } } } } }

Can someone point me in the right direction?

Thanks!

Replies are listed 'Best First'.
Re: Compare hash with arrays and print
by almut (Canon) on Jul 12, 2010 at 19:05 UTC

    By setting $/ appropriately, you could read in entire records, which would make your life somewhat easier.

    Also, there is no need to iterate over all keys of the hash to then just pick the one where $key eq $name:

    foreach my $key(keys %hash){ if($key eq $name){ if($hash{$key} == 10){ ...

    A direct lookup $hash{$name} would serve the same purpose.

    Here's a simplified demo using __DATA__:

    #!/usr/bin/perl open (FILE1, ">10.txt") or die $!; open (FILE2, ">20.txt") or die $!; open (FILE3, ">30.txt") or die $!; my %hash = (aw1 => 10, qs2 => 20, dd3 => 30, de4 => 10, hg5 => 30, dfd6 => 20, gf4 => 20, hgh5 => 30, hgy3 => 10); { local $/ = "\n>"; # input record separator while (<DATA>){ s/^>//mg; my ($name) = /^([^\n]+)/; if($hash{$name} == 10){ select FILE1; } if($hash{$name} == 20){ select FILE2; } if($hash{$name} == 30){ select FILE3; } print ">$_"; } } __DATA__ >aw1 ATGCTAGATGCTAGCTAGCTAGCACTGAT CGATGCTAGCGTAGTCAGCTGATGCTGTA CGATGCTAGTCGTACG >qs2 CGAGCTAGTCGTAGTCGTGATGCTGATTA CGATGCTAGTCGTAGCTAGCTGATGCTGC CGATGCTAGTCGTAGTC >dd3 CGTAGTCGTAGTCGTAGTCGATGCTGATG GCTAGTCGATGCTAGCTAGTCGATGCTGG CGATGCTGAT >de4 CGTAGTCGTAGTCGTACGTAGTCGTGAGT CGATTATTTAGGAGGGACAAGGATAGTA >hg5 CGTAGTCGTAGTCTAGTCGTGATGCTAGA >dfd6 CGATGCTACGTACGTAGTCAGTCGTGATG AATTAGAGCAGATAGAGGGGGAAAGGGTT AAACCCC >gf4 CGTAGTCAGTCTAGCTGATGTCGATGCTG >hgh5 CATGCTAGTCGTAGTCGTAGTCGATGCTT TTTTAAGGGAACCCCC >hgy3 CCCCGGGTTTGGGAAAAGGGGGGGGATAG

    (just re-add the looping over your @files)

      Thanks for your reply!

      I am still not able to print anything to my files.

      my %hash = &readFile(); my @fasta = glob('Data_Test/*.fa'); foreach my $f (@fasta){ open FILE, $f or die "Cannot open $fastaname for reading: $!\n"; local $/ = "\n>"; while (<FILE>) { s/^>//mg; my ($name) = /^(\w+)/; if($hash{$name} == 10){ select FILE1; } if($hash{$name} == 20){ select FILE2; } if($hash{$name} == 30){ select FILE3; } print ">$_"; } } close(FILE1); close(FILE2); close(FILE3);

        Add some debugging prints (print STDERR ... preferably, as you're manipulating the default output file handle) to figure out what's different from the sample I've given (which does work fine for me).

        For example, print the records ($_), $name, the corresponding hash values $hash{$name}, etc.  Are they what you'd expect?

        update: oops I do see that you do have a print statement. and looks like it should work.

        This "select FILE3;" statement by itself does nothing useful. I actually wouldn't use select at all in this situation.

        The Perl print statement is a "smart" critter. If the first arg of print is the file handle of some open file, Perl will print to that file handle. print FILE3 "abc"; will print "abc" to FILE3. The Perl default is essentially "print stdout "abc";". Select makes the default print go to wherever you want (instead of stdout), but here it appears easier to just put the file handle in the print statement.

        print FILE3 $_; or similar will work fine. A plain "print;" sends $_ to the default file handle. With a file handle specified, I think you have to explicitly say $_ for the same effect.

Re: Compare hash with arrays and print
by biohisham (Priest) on Jul 12, 2010 at 19:42 UTC
    Using Bio::SeqIO from BioPerl provides a powerful way of dealing with I/O operations on biological data files having the FastA format or any other format for that matter. The input file object is created as well as three output file objects, each one of these objects has information about the format to read from or write into and the file name to read from or direct the output to, in case the output file doesn't exist, it is created for you too.

    Using an appropriate BioPerl interface will eliminate the need to construct regexes to detect sequence identifiers and strings, it will also allow you to flexibly migrate among different biological data formats on the go. That will add up to saving time focusing on the data manipulation tasks rather than coding techniques implementation...

    #!/usr/local/bin/perl #title "Compare hash with arrays and print" use strict; use warnings; use Bio::SeqIO; my %hash = ( aw1=>10, qs2=>20, dd3=>30, de4=>10, hg5=>30, dfd6=>20, gf4=>20, hgh5=>30, hgy3=>10, ); my $file = "Sample.fa"; my $file10 = "10.fa"; my $file20 = "20.fa"; my $file30 = "30.fa"; my $seq = Bio::SeqIO->new(-file => "<$file", -format=>'fasta'); # inpu +t object #output objects my $seqOut10 = Bio::SeqIO->new(-file => ">$file10", -format=>'fasta'); my $seqOut20 = Bio::SeqIO->new(-file => ">$file20", -format=>'fasta'); my $seqOut30 = Bio::SeqIO->new(-file => ">$file30", -format=>'fasta'); while(my $seqIn = $seq->next_seq()){ for my $key (keys %hash){ if($seqIn->id eq $key && $hash{$key}==10){ $seqOut10->write_seq($seqIn); }elsif($seqIn->id eq $key && $hash{$key} == 20 +){ $seqOut20->write_seq($seqIn); }elsif($seqIn->id eq $key && $hash{$ke +y} == 30){ $seqOut30->write_seq($seqIn); } } }


    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .

      I'd just like to second this solution.

      If you are working with sequence data a lot then it's definitely worth investing some time learning the ways of BioPerl.

      In addition to its ability to handle a huge variety of data formats, it provides modules to interact with and run numerous sequence analysis packages such as BLAST, EMBOSS and RepeatMasker

      Learning the potential of BioPerl combined with the Ensembl Perl API has stood me in very good stead over the years.

      Best of luck with your project

Re: Compare hash with arrays and print
by toolic (Bishop) on Jul 12, 2010 at 18:34 UTC
    Can you clarify what problem you are having?

    If the problem is that your 3 output files are empty, then you need to print something to them. The only print I see in your code has been commented out.

      Doesn't select command selects the file and print the contents?

      I used it as below in my code before:

      my $in = $ARGV[0]; my $x = $ARGV[1]; my $y = $ARGV[2]; open IN , "<$in" or die $!; open X, ">$x" or die $!; open Y, ">$y" or die $!; while (<IN>) { if (/^>/) { if (/infor_present:/) { select X; } else{ select Y; } } print; }

      In the above code, there is a "print" statement after it comes out of loop. So in order to print the contents in multiple files, how can we use print in this manner?

      Also, I tried making a hash for the Sample.fa files with header as key and seq as values. But since I am reading multiple files, that approach is not working (as I run out of memory!). Kindly help.

      Thanks!

        Doesn't select command selects the file and print the contents?
        No, select alone will not print to a file. You must also use print, as you have done in your new code example.
Re: Compare hash with arrays and print
by jwkrahn (Abbot) on Jul 12, 2010 at 19:02 UTC

    Perhaps something like this will do what you require UNTESTED:

    my %hash = ( aw1 => 10, qs2 => 20, dd3 => 30, de4 => 10, hg5 => 30, dfd6 => 20, gf4 => 20, hgh5 => 30, hgy3 => 10, ); { # exchange number values for filehandles open my $FILE1, '>', '10.txt' or die "Cannot open '10.txt' $!"; open my $FILE2, '>', '20.txt' or die "Cannot open '20.txt' $!"; open my $FILE3, '>', '30.txt' or die "Cannot open '30.txt' $!"; for my $val ( values %hash ) { $val = $FILE1 if $val == 10; $val = $FILE2 if $val == 20; $val = $FILE3 if $val == 30; } } local @ARGV = glob 'Data/*.fa'; my $name; while ( my $line = <> ) { if ( $line =~ /^>(.+)/ ) { $name = $1; } print { $hash{ $name } } $_ if exists $hash{ $name }; }
Re: Compare hash with arrays and print
by Marshall (Canon) on Jul 12, 2010 at 21:36 UTC
    I reversed the hash to get a unique list of the "numbers". Then opened that many files and put the file handles in another hash. This eliminates some of the "if" logic. Using different record separator does help, but a slight bit of fiddling is required.

    update: See how different filehandles are used in the print below. Also if what reverse is unclear add a print Dumper(\%rhash); statement.

    #!/usr/bin/perl -w use strict; use Data::Dumper; my %hash = (aw1 => 10, qs2 => 20, dd3 => 30, de4 => 10, hg5 => 30, dfd6 => 20, gf4 => 20, hgh5 => 30, hgy3 => 10); my %rhash = reverse %hash; my %fh; foreach my $num (keys %rhash) { open ($fh{"$num.txt"} , '>', "$num.txt") or die ; } $/ = '>'; while (<DATA>) { my $tag = (m/^(\w+)/)[0]; #"blank" rec at beginning next unless defined $tag; s/>$//; my $filename = "$hash{$tag}.txt"; my $handle = $fh{$filename}; print $handle ">$_"; } __DATA__ >aw1 ATGCTAGATGCTAGCTAGCTAGCACTGAT CGATGCTAGCGTAGTCAGCTGATGCTGTA CGATGCTAGTCGTACG >qs2 CGAGCTAGTCGTAGTCGTGATGCTGATTA CGATGCTAGTCGTAGCTAGCTGATGCTGC CGATGCTAGTCGTAGTC >dd3 CGTAGTCGTAGTCGTAGTCGATGCTGATG GCTAGTCGATGCTAGCTAGTCGATGCTGG CGATGCTGAT >de4 CGTAGTCGTAGTCGTACGTAGTCGTGAGT CGATTATTTAGGAGGGACAAGGATAGTA >hg5 CGTAGTCGTAGTCTAGTCGTGATGCTAGA >dfd6 CGATGCTACGTACGTAGTCAGTCGTGATG AATTAGAGCAGATAGAGGGGGAAAGGGTT AAACCCC >gf4 CGTAGTCAGTCTAGCTGATGTCGATGCTG >hgh5 CATGCTAGTCGTAGTCGTAGTCGATGCTT TTTTAAGGGAACCCCC >hgy3 CCCCGGGTTTGGGAAAAGGGGGGGGATAG
      my %rhash = reverse %hash; my %fh; foreach my $num (keys %rhash)

      Since you only need the unique keys from %hash you only really need an array:

      my @keys = keys %{{ reverse %hash }}; my %fh; foreach my $num ( @keys )
        Correct. The code is basically equivalent. We both reverse the hash and then find the keys of that reversed hash. Making a named array @keys is actually an extra step. I suppose if we wanted to, the whole right hand side of the "my @keys" line could go into the foreach() statement but, that could be obtuse.

        I was trying to show the workings of the reverse in the most straightforward way possible (simple syntax). I updated post to show the Op how to easily use Dumper to print the reversed hash, which is little more obvious how to do when it has an assigned name (rhash).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://849071]
Approved by Corion
Front-paged by biohisham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (1)
As of 2025-03-21 06:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When you first encountered Perl, which feature amazed you the most?










    Results (63 votes). Check out past polls.