Where is my output stream to a file going?

lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hello PerlMonks!
I have a wierd problem. I am able to read the file, process it, and print to a file only
when I include the data in the script using __DATA__. When I read into the IN
filehandle and write to an out file I get a blank file. Any Ideas?
The following is the code:

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

#open file for reading
open(IN,"/Users/me/Desktop/CCDS.current.txt") or die " Can't open file
+: $!";
#open out file for writing
open(OUT, ">/Users/me/Desktop/withdrawndata.txt");
#remove the header
my $firstline= <IN>;
chomp $firstline;
while(<IN>){
    chomp; # remove the newline character
    my @fields = split/\t/; # split the file into columns where eachli
+ne in it 
    #populates and array for that column.
    if($fields[5] =~ m/Withdrawn/){ # print eachline that has "Withdra
+wn" in field[5]
        print  OUT "$_\n"; # print to file 
    }
}
close(IN);
close(OUT); 
__DATA__
#chromosome    nc_accession    gene    gene_id    ccds_id    ccds_stat
+us    cds_strand    cds_from    cds_to
1    NC_000001.8    NCRNA00115    79854    CCDS1.1    Withdrawn    -  
+  801942    802433
1    NC_000001.10    SAMD11    148398    CCDS2.2    Public    +    861
+321    879532
1    NC_000001.10    NOC2L    26155    CCDS3.1    Public    -    88007
+3    894619
1    NC_000001.10    PLEKHN1    84069    CCDS4.1    Public    +    901
+911    909954
1    NC_000001.10    HES4    57801    CCDS5.1    Public    -    934438
+    935352
1    NC_000001.10    ISG15    9636    CCDS6.1    Public    +    948953
+    949857
1    NC_000001.10    C1orf159    54991    CCDS7.2    Public    -    10
+18272    1026922
1    NC_000001.10    TTLL10    254173    CCDS8.1    Public    +    111
+5433    1120521
1    NC_000001.10    TNFRSF18    8784    CCDS9.1    Public    -    113
+8970    1141950
1    NC_000001.10    TNFRSF18    8784    CCDS10.1    Public    -    11
+39223    1141950
1    NC_000001.10    TNFRSF4    7293    CCDS11.1    Public    -    114
+6934    1149506
[download]

Your wisdom is appreciated!
LomSpace

Comment on Where is my output stream to a file going? Download Code

Replies are listed 'Best First'.
Re: Where is my output stream to a file going? by davido (Cardinal) on Mar 30, 2011 at 05:56 UTC
If you're getting a blank outfile, that either means print is failing (you could check by putting "or die $!" after your print statement), or your `if( $fields[5] =~ m/Withdrawn/ ) {...` is never matching. I suspect the latter. So why would your match fail if the data is coming from a file, but not if coming from a __DATA__ block? You'll have to investigate that yourself. But here are a few possibilities: There's an extra tab in there somewhere making 'Withdrawn' actually live one column to the right. There aren't enough tabs, making 'Withdrawn' live to the left of where you're expecting it. Your tabs in the file are actually a fixed number of spaces rather than an actual tab character. Your data is mistyped. ...one of those are the likely culprits. Dave	[reply] [d/l]
Re: Where is my output stream to a file going? by biohisham (Priest) on Mar 30, 2011 at 07:45 UTC
Do you intend to use the other `@fields` array elements in some way or the other? or are you only splitting in order to access the `$field[5]` to check whether it is Withdrwan or Public? in both cases, you can get a better edge if you filter your lines first and then do things with the elements that constitute each line, this way, you have obtained the interesting lines (the ones that has 'withdrwan') and reduced your data file to a somewhat manageable chunk... Here in the code posted below is a more direct approach where I read the file a line at a time, used a look-ahead regular expression to check for incidence of 'Withdrwan' -regardless of case- and then transferred that line to another new file that has the same header information as the source file with the objective of only filtering entries where 'Withdrawn' has appeared in that particular line... `my $path = "C:/Documents and Settings/aldaihi/Desktop/Monks"; open (my $fh, '<',"$path/genes.txt") or die ("could not open file $!\n +"); open (my $rfh,'>',"$path/results.txt") or die ("could not open file $! +\n"); my $firstLine = <$fh>; print $rfh $firstLine; while(<$fh>){ chomp; if(/(?=Withdrawn)/i){ #U can do things to the line in here #..... # split or rearrange .. # print $rfh $_,"\n"; } }` [download] a module like Text::Table can give you control over how your data is placed in columns without breaking your head on spacing issues... Excellence is an Endeavor of Persistence. A Year-Old Monk :D .	[reply] [d/l] [select]
Re^2: Where is my output stream to a file going? by lomSpace (Scribe) on Mar 30, 2011 at 15:08 UTC
biohisham, Thanks for the advice! LomSpace	[reply]
Re^3: Where is my output stream to a file going? by davido (Cardinal) on Mar 30, 2011 at 16:40 UTC
While it is probably courteous to thank people who have put some effort into helping you with a problem, I myself would prefer hearing how the problem was resolved, and what the issue was discovered to be. Knowing that one of our suggestions hit pay-dirt would be worth ten thank-yous to me. So what turned out to be the problem? How did you end up resolving it? Dave	[reply]
Re: Where is my output stream to a file going? by wind (Priest) on Mar 30, 2011 at 05:28 UTC
Don't know what your exact problem is, but I suggest you use the 3 parameter form of open and lexical file handles as well. `my $infile = "/Users/me/Desktop/CCDS.current.txt"; my $outfile = "/Users/me/Desktop/withdrawndata.txt"; open my $infh, $infile or die "$infile: $!"; open my $outfh, '>', $outfile or die "$outfile: $!"; #remove the header my $firstline= <$infh>; chomp $firstline; while (<$infh>){ chomp; my @fields = split /\t/; if($fields[5] =~ m/Withdrawn/){ print $outfh "$_\n"; } } close $infh; close $outfh;` [download]	[reply] [d/l]
Re: Where is my output stream to a file going? by samarzone (Pilgrim) on Mar 30, 2011 at 07:08 UTC
The best thing to debug your problem is "perl debugger" (perl -d) and the simplest thing is `"print"`. Did you check whether `$_` or `$fields[5]` contain any value? I found a problem and a suspicion. Here are following. You do not have a space between `split` and its argument There could be good chances that while copy/pasting the text you unintentionally converted the tabs into spaces and everything goes into `$fields[0]` I hope this helps -- Regards - Samar	[reply] [d/l] [select]
Re: Where is my output stream to a file going? by Marshall (Canon) on Mar 30, 2011 at 23:29 UTC
I suspect that davido is on the right track, re: spaces instead of tabs. The default split is on any sequence of one or more whitespace characters `/\s+/`, all 5 of them which include: `space,\r,\n,\t,\f`. From your data, I see no reason to limit the split to just on \t because you have just plain whitespace separated tokens (no spaces within the desired tokens). Perl is designed to work great with that format! Splitting on a particular type of whitespace (of the total of five that you cannot see on the screen) is usually a bad idea - the default split is usually a good idea unless you have a clear reason why its not. Also note that chomp() is not needed because \n is one of the whitespace characters. But chomp() is fast, so this is a nit. Perl has an amazing thing, list slice which is not used often enough. There is no need to create an @fields array. Use list slice to get what you want, ie: `my $ccds_status = (split)[5];`. One of the good things about this is that we've documented what the heck field 5 is! And know we can refer to it as $ccds_status instead of #5. In larger programs, this is a significant advantage. Another variant of "stuff we can't see" happens when there is say a blank line in the file that we can't see easily. When I did the cut and paste, I wound up with a trailing line with one space in it. There are a number of ways of dealing with this. Often my code will have: `next if /^\s+$/;`, meaning skip blank lines. Below, another way, I checked if $ccds_status was defined to prevent an error message when that trailing blank line is encountered. Also note that $ccds_status eq 'Withdrawn' instead of a regex would be ok also. When dealing with files generated by other computer programs, allowing for case often is not necessary - but that's also a minor nit. My two main points are: 1. Don't over restrict the whitespace split. 2. Use list slice to get variables into human readable names instead of using field numbers further in the program. #!/usr/bin/perl -w use strict; use Data::Dumper; my $firstline= <DATA>; #actually optional in this case while(<DATA>) { my $ccds_status = (split)[5]; #see discussion re: defined() if(defined($ccds_status) and $ccds_status =~ m/Withdrawn/) { print "$_\n"; # lines with Withdrawn } } #prints: #1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn - 801942 8 +02433 __DATA__ #chromosome nc_accession gene gene_id ccds_id ccds_status cds_st +rand cds_from cds_to 1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn - 801942 80 +2433 1 NC_000001.10 SAMD11 148398 CCDS2.2 Public + 861321 879532 1 NC_000001.10 NOC2L 26155 CCDS3.1 Public - 880073 894619 1 NC_000001.10 PLEKHN1 84069 CCDS4.1 Public + 901911 909954 1 NC_000001.10 HES4 57801 CCDS5.1 Public - 934438 935352 1 NC_000001.10 ISG15 9636 CCDS6.1 Public + 948953 949857 1 NC_000001.10 C1orf159 54991 CCDS7.2 Public - 1018272 10 +26922 1 NC_000001.10 TTLL10 254173 CCDS8.1 Public + 1115433 112052 +1 1 NC_000001.10 TNFRSF18 8784 CCDS9.1 Public - 1138970 11 +41950 1 NC_000001.10 TNFRSF18 8784 CCDS10.1 Public - 113922 +3 1141950 1 NC_000001.10 TNFRSF4 7293 CCDS11.1 Public - 1146934 11 +49506 [download]	[reply] [d/l] [select]


more useful options
	PerlMonks