Re: writing to arrays
by pg (Canon) on Dec 25, 2002 at 18:57 UTC
|
Try to base your code on the attached. Couple of things:
- You have to consider that, your data may come in out of sequence, DATA SET 10 might come before DATA SET 9, so you have to determine the index, before you can add the string to the array.
- Also you may have some data missing, for example, you may not have DATA SET 7 between DATA SET 6 and 8, you need to take this into consideration. Again this makes it a must for you to determine the index on fly.
- Is it possible for you to have the same DATA SET number more than once? For example, you had ">DATA SET 10", and later have "DATA SET 10" again. If yes, you have to think of a way to handle it base on the requirement. In my code, I just concat the latest with all old ones.
- In my code, I do not chomp away the newlines. If you want to remove them, uncomment that chomp.
- In my code, I left the array element at index zero undef all the time. If you want, you can use it, just substract 1 from the number you read from the file.
- in your m//, that \ before > is not needed, although it does not cause problem.
use Data::Dumper;
use strict;
my @data;
my $cur_index;
open(DATA, "<", "data.txt");
while (<DATA>) {
#chomp;
if (m/^>DATA SET (\d+)/) {
$cur_index = $1;
} else {
$data[$cur_index] .= $_;
}
}
close(DATA);
print Dumper(@data);
(UPDATE: When the question was first posted, the data part was not well formatted as it is now, and my first solution assumed that the real data came right after the "DATA SET n", without line broken. Obviously that solution was wrong.
Thanks to gjb, he sent me a message, and pointed out that my solution didn't make sense to him. Then I checked the original question, and realized the format is now different after adding tags.
I really appreciate, not just gjb's tech point, but more important the way he handled it, which clearly shows his pleasant personality.)
| [reply] [Watch: Dir/Any] [d/l] |
|
thanks 4 ur reply but i think that what u r suggesting is a bit more complex than what i may actually need. Each line in my file is a $string, and what i need to do is to start at ">" and read every string i find from then into an @array until i find another ">"
| [reply] [Watch: Dir/Any] |
Re: writing to arrays
by Arien (Pilgrim) on Dec 25, 2002 at 20:45 UTC
|
my @data;
my $curr;
while (<DATA>) {
$curr = $1, next if /^>DATA\s+SET\s+(\d+)/;
push @{$data[$curr]}, $_;
}
use Data::Dumper if you are unsure about the structure of @data.
— Arien | [reply] [Watch: Dir/Any] [d/l] |
Re: writing to arrays
by John M. Dlugosz (Monsignor) on Dec 25, 2002 at 20:22 UTC
|
If I understand the question, you want a separate array for each DATA SET, and each line is one array entry in the proper set.
my @array_set;
my $array;
while (<DATA>) {
if (/^>DATA SET (\d+)) {
$array= \$array_set[$1];
}
else {
push @$array, $_;
}
}
Something like that; I may have typos and such. How this works is that if a >DATA SET xxx line is seen, then $array is set to point to the proper array. Otherwise, a line is added to the current array. | [reply] [Watch: Dir/Any] [d/l] |
Re: writing to arrays
by snafu (Chaplain) on Dec 25, 2002 at 21:20 UTC
|
Using the flip-flop operator:
#!/usr/bin/perl -w
use strict;
my @darray;
my $set;
my $eo;
while ( <DATA> ) {
chomp();
$eo = ( />DATA SET/ .. />DATA SET/ );
/DATA SET (\d+)/;
if ( $1 ) {
$set = $1;
}
if ( $eo =~ /E0/ ) {
$darray[$set] .= $_."\n";
next;
}
$darray[$set] .= $_."\n" if ( $set );
}
for ( my $c = 0 ; $c <= $#darray ; $c++ ) {
print "array element: $c\n";
print "$darray[$c]\n";
}
__DATA__
>DATA SET 1
HSAJDHSDHSADHDSALHDASLDHSALDH
HGDKJSHDSADHSALDHLHLDHASDLSAH
HKJAHCADHALIDHALSDHLSADHALHDA
>DATA SET 2
HSAJDHSDHSADHDSALHDASLDHSALDH
HGDKJSHDSADHSALDHLHLDHASDLSAH
HKJAHCADHALIDHALSDHLSADHALHDA
>DATA SET 3
HSAJDHSDHSADHDSALHDASLDHSALDH
HGDKJSHDSADHSALDHLHLDHASDLSAH
HKJAHCADHALIDHALSDHLSADHALHDA
>DATA SET 4
HSAJDHSDHSADHDSALHDASLDHSALDH
HGDKJSHDSADHSALDHLHLDHASDLSAH
HKJAHCADHALIDHALSDHLSADHALHDA
>DATA SET 5
HSAJDHSDHSADHDSALHDASLDHSALDH
HGDKJSHDSADHSALDHLHLDHASDLSAH
HKJAHCADHALIDHALSDHLSADHALHDA
>DATA SET 6
HSAJDHSDHSADHDSALHDASLDHSALDH
HGDKJSHDSADHSALDHLHLDHASDLSAH
HKJAHCADHALIDHALSDHLSADHALHDA
_
_
_
_
_
_
_
_
_
_
- Jim
Insert clever comment here... | [reply] [Watch: Dir/Any] [d/l] |
Re: writing to arrays
by Wonko the sane (Deacon) on Dec 26, 2002 at 13:56 UTC
|
Another way to do this, keeping the index of array as the number of the DATA SET
#!/usr/local/bin/perl -w
use strict;
use Data::Dumper;
my @records;
{
local $/ = '>'; # record separator.
while ( <DATA> )
{
push( @{$records[$1]}, split(/(?:\n+|>)/) )
if ( s/DATA SET ([0-9]+)\n+// );
}
}
print Dumper( \@records );
__DATA__
>DATA SET 1
1aHSAJDHSDHSADHDSALHDASLDHSALDH
1bHGDKJSHDSADHSALDHLHLDHASDLSAH
1cHKJAHCADHALIDHALSDHLSADHALHDA
>DATA SET 2
2aHSAJDHSDHSADHDSALHDASLDHSALDH
2bHGDKJSHDSADHSALDHLHLDHASDLSAH
2cHKJAHCADHALIDHALSDHLSADHALHDA
>DATA SET 3
3aHSAJDHSDHSADHDSALHDASLDHSALDH
3bHGDKJSHDSADHSALDHLHLDHASDLSAH
3cHKJAHCADHALIDHALSDHLSADHALHDA
Some extra juggling is done to clean output. This is what it looks like.
:!./test.pl
$VAR1 = [
undef,
[
'1aHSAJDHSDHSADHDSALHDASLDHSALDH',
'1bHGDKJSHDSADHSALDHLHLDHASDLSAH',
'1cHKJAHCADHALIDHALSDHLSADHALHDA'
],
[
'2aHSAJDHSDHSADHDSALHDASLDHSALDH',
'2bHGDKJSHDSADHSALDHLHLDHASDLSAH',
'2cHKJAHCADHALIDHALSDHLSADHALHDA'
],
[
'3aHSAJDHSDHSADHDSALHDASLDHSALDH',
'3bHGDKJSHDSADHSALDHLHLDHASDLSAH',
'3cHKJAHCADHALIDHALSDHLSADHALHDA'
]
];
Best Regards,
Wonko
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: writing to arrays
by tandemrepeat (Initiate) on Dec 26, 2002 at 16:39 UTC
|
This looks like a multi-fasta file holding DNA or protein sequence data (with sequence ID after the >). I use one of the following two ways to get this info into an array before piping into blast or other sequence manipulations.
The first is a loop from some hand-me-down code that works quite well (but any comments etc on optimization etc v. welcome...)
open( FASTAFILE, $ARGV[0] );
while (<FASTAFILE>) {
if ( /^>/ && $seqflag == 1 ) {
push ( @sequences, $fasta );
$fasta = "";
$fasta = $_;
}
elsif (/^>/) {
$fasta = $_;
$seqflag = 1;
}
else {
$fasta .= $_;
}
}
push ( @sequences, $fasta );
#then iterate @sequences to run over BLAST
The other (better?) way is the very nice Bioperl modules that have methods that specifically handle multifasta flat files. Also check out EMBOSS, a sequence analysis suite that interfaces with BioPerl...EMBOSS + BioPerl makes life sooo much easier...
From the bioperl tutorial...
# script 1: create the index
use Bio::Index::Fasta; # using fasta file format
$Index_File_Name = shift;
$inx = Bio::Index::Fasta->new(
-filename => $Index_File_Name,
-write_flag => 1);
$inx->make_index(@ARGV);
# script 2: retrieve some files
use Bio::Index::Fasta;
$Index_File_Name = shift;
$inx = Bio::Index::Fasta->new($Index_File_Name);
foreach $id (@ARGV) {
$seq = $inx->fetch($id); # Returns Bio::Seq object
# do something with the sequence
}
Hope this helps,
tandemrepeat | [reply] [Watch: Dir/Any] [d/l] [select] |
|
T.R
Thanks for your comments. Indeed the file that i am playing with is a FASTA file which will be put thru BLAST eventuallay to generate some output.
Thanks a lot 4 ur help!
No more answers for this question reqd monks...thx 2 every1 that replied!
| [reply] [Watch: Dir/Any] |
Re: writing to arrays
by Anonymous Monk on Dec 26, 2002 at 07:31 UTC
|
while (<FILE>){
chomp;
next if $_ =~/\>DATA SET/;
push @array,$_;
}
Everyone seems to be making this more complicated than it
needs to be.
Superman
Make sure you qualify your data if possible.
Cheching string length may be one option
Its always good to check incomming data you may not have control of. | [reply] [Watch: Dir/Any] |
|
All this code does is push the whole file into a single array haveing removed every line that contains the text ">DATA SET". Your code does not even ensure that this is found at the start of the line.
What you end up with is an array of everything munged together, all grouping information lost with no way to recover it.
It's hard to believe that this will meet the OP's requirements.
Examine what is said, not who speaks.
| [reply] [Watch: Dir/Any] |
|
Yep, I know. I did not realize he/she needed seperate arrays for each dataset. Realized after posting, Please ignore previous post.
sorry- hand officially slaped
| [reply] [Watch: Dir/Any] |