Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

-s testing for empty file; works on local, but not on remote

by Jeri (Scribe)
on Sep 26, 2011 at 20:52 UTC ( #927946=perlquestion: print w/ replies, xml ) Need Help??
Jeri has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've checked this, double, checked this, and triple checked this with the EXACT files. My code works completely on my local computer, but does not on my remote computer (both compile). I used several catches to identify the problem and it seems that the -s (found in the sub MuscleHMMERsearch) is not allowing the code on the remote server to proceed through the 'if' statement. The remote computer gives a false, while, my local computer gives a true. Or maybe I just need a little guidance? Please, please, please, I beg you, do not say it's my files. I tried them tried, and tried them, and came back again and tried them. I'm certain the -s in the 'if' statement is causing all my woes.
#!/usr/bin/perl5.8.8 use strict; use warnings; my $familyName; #stores the family name open (my $OUTFILE,">","FamDATA") || die $!; #opens outfile that will +contain the protein families and all their associated sequences with +an e-value <= 0.001 print $OUTFILE "family\ttarget,e-value,bit-score"; #header for the out +file tempProteinFamFileCreator(); #creates a temp file with protein family +sends it to MUSCLE and HMMER, and then sends table output to sub Tab +leParse to extract necessary data (this includes the protein families + and all their associated sequences with an e-value <= 0, the actual +e-value, and the bit-score) close ($OUTFILE); #closes the outfile #creates a temp file with protein family sends it to MUSCLE and HMMER, + and then sends table output to sub TableParse to extract necessary +data (this includes the protein families and all their associated seq +uences with an e-value <= 0, the actual e-value, and the bit-score sub tempProteinFamFileCreator { my $infile = $ARGV[0]."_ProFam"; #captures the name of the infiles (Pr +otein Family Files) as determined by the STNDIN open (my $INFILE,"<", $infile) || die $!; #opens the infile (Protein F +amily Files) open (my $TEMP,">",'temp') || die $!; #opens the temp file which will + store a single protein family at a time. while(<$INFILE>) #entering the infile { my $line = $_; #sets the default iterator to a value if ($line =~/^##(UniRef90_[\w\d]+) Protein Family/) { close ($TEMP); #closes temp file MuscleHMMERsearch(); #enters subroutine MuscleHMMERsearch open ($TEMP,">",'temp'); #opens temp file seek ($TEMP, 0 ,0); #used to overwrite the temp file each time +, by returning to the beginning of the file $familyName = $1; #storing the protein family name } if ($line =~/^>UniRef90_[\w\d]+\.UniRef100_[\w\d]+/ || $line = +~/^[\w\d]+/) { chomp $_; #removes any remaining white space from around the d +efault iterator print $TEMP "$_\n"; #prints to temp file } if ($line =~/^>File/) { close ($TEMP); #closes temp file MuscleHMMERsearch(); #enters subroutine MuscleHMMERsearch } } #end of while loop close ($INFILE); #closes the $INFILE handle } #end of subroutine tempProteinFamFileCreator #temp file (single protein family) is sent to MUSCLE, hmmbuild, hmmsea +rch and creates a table. sub MuscleHMMERsearch { open (my $TEMP,"<",'temp') || die $!; #opens temp file (single protei +n family) if (-s "/home/vcg/Documents/Trial/temp") #checks to see if file is + not empty { # MUSCLE creating a multiple alignment file my $Fprot = "temp"; my $MAF = "MuscleAlignmentFile"; my $cmd = "/home/vcg/Documents/Trial/muscle -in $Fprot -out $MAF"; print $cmd. "\n"; system ($cmd); if ($?) {die "command $cmd failed\n"}; # HMMER creating a profile HMM from a multiple alignment file my $HMMbf = "HMMbuildfile"; $cmd = "/home/vcg/Documents/Trial/hmmbuild --informat afa $HMMbf $ +MAF"; print $cmd. "\n"; system ($cmd); if ($?) {die "command $cmd failed\n"}; $cmd = "rm /home/vcg/Documents/Trial/$MAF"; system ($cmd); # Searching the sequence database with hmmsearch my $results = "HMMsearch_results"; $cmd = "/home/vcg/Documents/Trial/hmmsearch --F1 .002 --F2 .001 -- +F3 .00001 --tblout HMMtable.tbl --cpu 1 -E .001 $HMMbf uniref100.fast +a >> $results"; print $cmd. "\n"; system ($cmd); if ($?) {die "command $cmd failed\n"}; $cmd = "rm /home/vcg/Documents/Trial/$HMMbf"; system ($cmd); TableParser(); #enters subroutine TableParser } } #end of subroutine MuscleHMMERsearch #table output created in sub MuscleHMMERsearch, is parsed, and the nec +essary data is extract and sent/appended to a resulting file. This in +cludes the protein families and all their associated sequences with a +n e-value <= 0, the actual e-value, and the bit-score. sub TableParser { open (my $TABLE,"<","HMMtable.tbl") || die $!; #opens table ouput crea +ted in sub MuscleHMMERsearch print $OUTFILE "\n$familyName\t"; #prints the protein family name while(<$TABLE>) #space delimited file\ #enters the table file { if ($_=~/^#/){next;} #bypasses headers my @data = split(" ", $_); #the table file is space delimited. All + the data is divided by a space and stored in an array. if ($_=~/^UniRef100/) { my $TargetName = $data[0]; #stores target name/ sequences with e-v +alue <= 0.001 my $E_value = $data[4]; #stores the e-value of that sequence my $BitScore = $data[5]; #stores the bit-score of that sequence print $OUTFILE "$TargetName,$E_value,$BitScore\t"; #prints all thi +s data to a resulting file } } close ($TABLE); #closes $TABLE filehandle } # end of subroutine TableParser

Thank you, thank you, thank you

Comment on -s testing for empty file; works on local, but not on remote
Download Code
Re: -s testing for empty file; works on local, but not on remote
by onelesd (Pilgrim) on Sep 26, 2011 at 21:04 UTC
    If it works locally, but not remotely, and you are sure the files and directories are indeed there, then it's most likely a permissions issue - ie. the remote user does not have the correct permissions to read that file or directory. This is simple to test; as the user you are running the script, run from a terminal:
    $ ls -l /home/vcg/Documents/Trial/temp

      Thanks for the advice, however, I don't think that is the case since I'm creating the temp file in the program. I'm going to look into it just to be sure.

Re: -s testing for empty file; works on local, but not on remote
by ikegami (Pope) on Sep 26, 2011 at 21:10 UTC

    It doesn't take three screenfulls of code to demonstrate a problem with -s. Furthermore, it appears that you didn't even show the code that fails! I'm just going to offer a couple of general tips.

    First, what exactly does -s return. If it's zero, that means it thinks the file exists and that it's empty. But I suspect it's returning undef, in which case $! contains an error message. Is -s returning undef, and it so, what's $!?

    my $size = -s "/home/vcg/Documents/Trial/temp"; die("-s: $!") if !defined($size); if ($size) {

    A common mistake is to pass paths with trailing spaces or newlines. Or improper escaping of \. Make sure the variable contains what you think it does.

      This is the code that fails! Well it doesn't really fail. The same code that runs on the local, runs slightly differently (because of the -s, it think) in the remote. I'll look into what it's returning. Thanks for the heads up.

        You're saying /home/vcg/Documents/Trial/temp is a path to a remote file? Sorry, it didn't look like it to me.

      Also, I pasted in the entire code, because, I'm new at perl and I'm just guessing the -s is the problem from what checking I've done with the code. I could be wrong. Maybe someone else could catch my error? That was the idea anyway.

      Okay, just checked. There is no error message, and the temp file is definitely created.

        Everything is working smoothly. Thanks for your patience and wisdom

      I'm not exactly certain, but when I started using -z, the remote program began catching my empty files like the local one.

        There is one difference:

        If you don't check for errors, errors (incl non-existent files) being returned by -s will be mistaken as empty file.

        If you don't check for errors, errors (incl non-existent files) being returned by -z will be mistaken as non-empty file.

Re: -s testing for empty file; works on local, but not on remote
by pvaldes (Chaplain) on Sep 26, 2011 at 23:13 UTC

    Four notes about your code, probably wrong and surely unrelated with your problem, but...

    1 - You could use chdir and/or put "/home/vcg/Documents/Trial" in a $dirname var saving a lot of typing. This also will protect you of mistakes like this:

    $cmd = "rm /home/vcg/Documents/Trial/ $MAF"; #this extra space could delete the whole file $MAF AND the whole parentdir (at least under some circumstances) when you pass this chain to "system" in the next line.

    (Also, consider to unlink a file instead to rm a file, is more portable)

    2 - You should survey all this changes in the value of the cmd var, I'm worried specially by things like this:

    my $cmd = "/home/vcg/Documents/Trial/muscle -in $Fprot -out $MAF"; system ($cmd);

    see the trouble here? you're missing a "." sign. If you want to run the local executable named muscle in the shell you should write "./muscle", not simply "muscle". Even if your local system knows that muscle IS an executable, the remote system could ignore all about this. This could also be ambiguous to the remote shell (that will search in vain the file in its own hard disk and return false instead to run the command) .

    3 - if you "open my $filehandle..." you can treat $filehandle as any other scalar variable, you could probably use simply while ($filehandle) instead while (<$FILEHANDLE>) unless you have a reason to do this (that I'm probably missing)

    4 - There is room still for improving your code. You could probably put "use autodie;" in your header, quit a lot of unnecessary "()" here and there, avoid the need for creating some vars, etc

      3 - if you "open my $filehandle..." you can treat $filehandle as any other scalar variable, you could probably use simply while ($filehandle) instead while (<$FILEHANDLE>) unless you have a reason to do this (that I'm probably missing)

      while ($filehandle) loops while the variable $filehandle contains a TRUE value.

      while (<$FILEHANDLE>) is short for while ( defined( $_ = readline $FILEHANDLE ) ) and it loops until readline returns undef.

        aha, now is clear to me, thanks for the explanation

      Thanks, I'll tidy it up.

Re: -s testing for empty file; works on local, but not on remote
by Anonymous Monk on Sep 27, 2011 at 01:09 UTC

    This

    tempProteinFamFileCreator(); ... sub tempProteinFamFileCreator { my $infile = $ARGV[0]."_ProFam";

    is more generically written as

    ... tempProteinFamFileCreator( @ARGV ); ... sub tempProteinFamFileCreator { my $infile = $_[0] .'_ProFam'; # or # my $infile = shift .'_ProFam'; ...

    Now sub tempProteinFamFileCreator doesn't depend on @ARGV :) after all, you wouldn't write

    @ARGV = ( 1 , 2 , 3 ); print @ARGV;
    instead of
    print 1,2,3;
      Thanks, I'll def fix it

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://927946]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (10)
As of 2014-12-19 13:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (82 votes), past polls