Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Where is my output stream to a file going?

by lomSpace (Scribe)
on Mar 30, 2011 at 05:09 UTC ( [id://896316]=perlquestion: print w/replies, xml ) Need Help??

lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hello PerlMonks!
I have a wierd problem. I am able to read the file, process it, and print to a file only
when I include the data in the script using __DATA__. When I read into the IN
filehandle and write to an out file I get a blank file. Any Ideas?
The following is the code:

#!/usr/bin/perl -w use strict; use Data::Dumper; #open file for reading open(IN,"/Users/me/Desktop/CCDS.current.txt") or die " Can't open file +: $!"; #open out file for writing open(OUT, ">/Users/me/Desktop/withdrawndata.txt"); #remove the header my $firstline= <IN>; chomp $firstline; while(<IN>){ chomp; # remove the newline character my @fields = split/\t/; # split the file into columns where eachli +ne in it #populates and array for that column. if($fields[5] =~ m/Withdrawn/){ # print eachline that has "Withdra +wn" in field[5] print OUT "$_\n"; # print to file } } close(IN); close(OUT); __DATA__ #chromosome nc_accession gene gene_id ccds_id ccds_stat +us cds_strand cds_from cds_to 1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn - + 801942 802433 1 NC_000001.10 SAMD11 148398 CCDS2.2 Public + 861 +321 879532 1 NC_000001.10 NOC2L 26155 CCDS3.1 Public - 88007 +3 894619 1 NC_000001.10 PLEKHN1 84069 CCDS4.1 Public + 901 +911 909954 1 NC_000001.10 HES4 57801 CCDS5.1 Public - 934438 + 935352 1 NC_000001.10 ISG15 9636 CCDS6.1 Public + 948953 + 949857 1 NC_000001.10 C1orf159 54991 CCDS7.2 Public - 10 +18272 1026922 1 NC_000001.10 TTLL10 254173 CCDS8.1 Public + 111 +5433 1120521 1 NC_000001.10 TNFRSF18 8784 CCDS9.1 Public - 113 +8970 1141950 1 NC_000001.10 TNFRSF18 8784 CCDS10.1 Public - 11 +39223 1141950 1 NC_000001.10 TNFRSF4 7293 CCDS11.1 Public - 114 +6934 1149506

Your wisdom is appreciated!
LomSpace

Replies are listed 'Best First'.
Re: Where is my output stream to a file going?
by davido (Cardinal) on Mar 30, 2011 at 05:56 UTC

    If you're getting a blank outfile, that either means print is failing (you could check by putting "or die $!" after your print statement), or your if( $fields[5] =~ m/Withdrawn/ ) {... is never matching.

    I suspect the latter. So why would your match fail if the data is coming from a file, but not if coming from a __DATA__ block? You'll have to investigate that yourself. But here are a few possibilities: There's an extra tab in there somewhere making 'Withdrawn' actually live one column to the right. There aren't enough tabs, making 'Withdrawn' live to the left of where you're expecting it. Your tabs in the file are actually a fixed number of spaces rather than an actual tab character. Your data is mistyped. ...one of those are the likely culprits.


    Dave

Re: Where is my output stream to a file going?
by biohisham (Priest) on Mar 30, 2011 at 07:45 UTC
    Do you intend to use the other @fields array elements in some way or the other? or are you only splitting in order to access the $field[5] to check whether it is Withdrwan or Public? in both cases, you can get a better edge if you filter your lines first and then do things with the elements that constitute each line, this way, you have obtained the interesting lines (the ones that has 'withdrwan') and reduced your data file to a somewhat manageable chunk... Here in the code posted below is a more direct approach where I read the file a line at a time, used a look-ahead regular expression to check for incidence of 'Withdrwan' -regardless of case- and then transferred that line to another new file that has the same header information as the source file with the objective of only filtering entries where 'Withdrawn' has appeared in that particular line...

    my $path = "C:/Documents and Settings/aldaihi/Desktop/Monks"; open (my $fh, '<',"$path/genes.txt") or die ("could not open file $!\n +"); open (my $rfh,'>',"$path/results.txt") or die ("could not open file $! +\n"); my $firstLine = <$fh>; print $rfh $firstLine; while(<$fh>){ chomp; if(/(?=Withdrawn)/i){ #U can do things to the line in here #..... # split or rearrange .. # print $rfh $_,"\n"; } }
    a module like Text::Table can give you control over how your data is placed in columns without breaking your head on spacing issues...

    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .
      biohisham,
      Thanks for the advice!
      LomSpace

        While it is probably courteous to thank people who have put some effort into helping you with a problem, I myself would prefer hearing how the problem was resolved, and what the issue was discovered to be. Knowing that one of our suggestions hit pay-dirt would be worth ten thank-yous to me.

        So what turned out to be the problem? How did you end up resolving it?


        Dave

Re: Where is my output stream to a file going?
by wind (Priest) on Mar 30, 2011 at 05:28 UTC
    Don't know what your exact problem is, but I suggest you use the 3 parameter form of open and lexical file handles as well.
    my $infile = "/Users/me/Desktop/CCDS.current.txt"; my $outfile = "/Users/me/Desktop/withdrawndata.txt"; open my $infh, $infile or die "$infile: $!"; open my $outfh, '>', $outfile or die "$outfile: $!"; #remove the header my $firstline= <$infh>; chomp $firstline; while (<$infh>){ chomp; my @fields = split /\t/; if($fields[5] =~ m/Withdrawn/){ print $outfh "$_\n"; } } close $infh; close $outfh;
Re: Where is my output stream to a file going?
by samarzone (Pilgrim) on Mar 30, 2011 at 07:08 UTC

    The best thing to debug your problem is "perl debugger" (perl -d) and the simplest thing is "print". Did you check whether $_ or $fields[5] contain any value? I found a problem and a suspicion. Here are following.

    1. You do not have a space between split and its argument
    2. There could be good chances that while copy/pasting the text you unintentionally converted the tabs into spaces and everything goes into $fields[0]

    I hope this helps

    --
    Regards
    - Samar
Re: Where is my output stream to a file going?
by Marshall (Canon) on Mar 30, 2011 at 23:29 UTC
    I suspect that davido is on the right track, re: spaces instead of tabs.

    The default split is on any sequence of one or more whitespace characters /\s+/, all 5 of them which include: space,\r,\n,\t,\f. From your data, I see no reason to limit the split to just on \t because you have just plain whitespace separated tokens (no spaces within the desired tokens). Perl is designed to work great with that format! Splitting on a particular type of whitespace (of the total of five that you cannot see on the screen) is usually a bad idea - the default split is usually a good idea unless you have a clear reason why its not. Also note that chomp() is not needed because \n is one of the whitespace characters. But chomp() is fast, so this is a nit.

    Perl has an amazing thing, list slice which is not used often enough. There is no need to create an @fields array. Use list slice to get what you want, ie: my $ccds_status = (split)[5];. One of the good things about this is that we've documented what the heck field 5 is! And know we can refer to it as $ccds_status instead of #5. In larger programs, this is a significant advantage.

    Another variant of "stuff we can't see" happens when there is say a blank line in the file that we can't see easily. When I did the cut and paste, I wound up with a trailing line with one space in it. There are a number of ways of dealing with this. Often my code will have: next if /^\s+$/;, meaning skip blank lines. Below, another way, I checked if $ccds_status was defined to prevent an error message when that trailing blank line is encountered.

    Also note that $ccds_status eq 'Withdrawn' instead of a regex would be ok also. When dealing with files generated by other computer programs, allowing for case often is not necessary - but that's also a minor nit.

    My two main points are:
    1. Don't over restrict the whitespace split.
    2. Use list slice to get variables into human readable names instead of using field numbers further in the program.

    #!/usr/bin/perl -w use strict; use Data::Dumper; my $firstline= <DATA>; #actually optional in this case while(<DATA>) { my $ccds_status = (split)[5]; #see discussion re: defined() if(defined($ccds_status) and $ccds_status =~ m/Withdrawn/) { print "$_\n"; # lines with Withdrawn } } #prints: #1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn - 801942 8 +02433 __DATA__ #chromosome nc_accession gene gene_id ccds_id ccds_status cds_st +rand cds_from cds_to 1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn - 801942 80 +2433 1 NC_000001.10 SAMD11 148398 CCDS2.2 Public + 861321 879532 1 NC_000001.10 NOC2L 26155 CCDS3.1 Public - 880073 894619 1 NC_000001.10 PLEKHN1 84069 CCDS4.1 Public + 901911 909954 1 NC_000001.10 HES4 57801 CCDS5.1 Public - 934438 935352 1 NC_000001.10 ISG15 9636 CCDS6.1 Public + 948953 949857 1 NC_000001.10 C1orf159 54991 CCDS7.2 Public - 1018272 10 +26922 1 NC_000001.10 TTLL10 254173 CCDS8.1 Public + 1115433 112052 +1 1 NC_000001.10 TNFRSF18 8784 CCDS9.1 Public - 1138970 11 +41950 1 NC_000001.10 TNFRSF18 8784 CCDS10.1 Public - 113922 +3 1141950 1 NC_000001.10 TNFRSF4 7293 CCDS11.1 Public - 1146934 11 +49506

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://896316]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-19 22:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found