Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Hello im new to programming i need help

by fahadm89 (Initiate)
on Feb 09, 2013 at 22:05 UTC ( #1017993=perlquestion: print w/ replies, xml ) Need Help??
fahadm89 has asked for the wisdom of the Perl Monks concerning the following question:

Hello im new to programming i need help what i am trying to do is to compare a set of coordinates from one file to another set of coordinates from many different files. i am having trouble doing this please can some 1 help me here is what i have done so far which compares 2 sets. i want to be able to change the first part to cycle through all files i have in a directory.

#!/usr/bin/perl -w #$dir = "/net/klab2/u2/home/fmohammad/GFP2/run1"; #opendir (DIR, $dir) or die $!; #@one = readdir (DIR); #foreach $file (@one) #{ $initialfile2 = "UM_802_R203E112_1GFL_clean_gfp_5.pdb"; open FILETWO, "$initialfile2" or die "cannot open $initialfile2 for re +ad\n"; while ($line2 = <FILETWO>) { chomp $line2; @one=split(/\s+/, $line2); if ($one[0]=~m/^HETATM/) { if ($one[2]eq "C4") { $xC4=$one[6]; $yC4=$one[7]; $zC4=$one[8]; } if ($one[2]eq "C1") { $xC1=$one[6]; $yC1=$one[7]; $zC1=$one[8]; } if ($one[2]eq "C13") { $xC13=$one[6]; $yC13=$one[7]; $zC13=$one[8]; } } } close FILETWO; $initialfile = "1GFL.pdb"; open FILEONE, "$initialfile" or die "cannot open $initialfile for read +\n"; while ($line = <FILEONE>) { chomp $line; @two=split(/\s+/, $line); if ($two[0]=~m/^ATOM/) { if ($two[1]eq "479") { $xS=$two[6]; $yS=$two[7]; $zS=$two[8]; } if ($two[1]eq "484") { $xY=$two[6]; $yY=$two[7]; $zY=$two[8]; } if ($two[1]eq "496") { $xG=$two[6]; $yG=$two[7]; $zG=$two[8]; } } } close FILEONE; $part1 = ((($xC1 - $xS)**2) + (($yC1 - $yS)**2) + (($zC1 - $zS)**2)); $part2 = ((($xC4 - $xY)**2) + (($yC4 - $yY)**2) + (($zC4 - $zY)**2)); $part3 = ((($xC13 - $xG)**2) + (($yC13 - $yG)**2) + (($zC13 - $zG)**2) +); $sum = $part1 + $part2 + $part3; $sum1 = $sum / 3; $rmsd = sqrt ($sum1); print"$rmsd\n"; #}

Comment on Hello im new to programming i need help
Download Code
Re: Hello im new to programming i need help
by Anonymous Monk on Feb 09, 2013 at 23:54 UTC
      open FILETWO, "$initialfile2" or die "cannot open $initialfile2 for read\n";

      In addition:

      • You should use the 3-arg form of open().
      • You should not use bareword filehandles.
      • You should print out the system error that caused the die().

      Like this:

      open my $INFILE, '<', $fname or die "Couldn't open $fname: $!";
Re: Hello im new to programming i need help
by jethro (Monsignor) on Feb 10, 2013 at 00:54 UTC

    You don't say exactly what's wrong with it. I didn't see any obvious bug in the code you commented out and I don't want to spend the time testing your code when you simply could tell us what exactly was wrong with it.

    But I can tell you that you won't have a great performance anyway if you reread the one file once for every one of the many files as you seem to be doing.

    If that one file fits into your memory, better read it first (and only once) and keep it in an array or hash.

    If it doesn't fit you could create a hash on disk (for example with the help of a module like DBM::Deep) with a key made out of the comma-separated concatenation of x,y and z).

      Here is what i just tried

      #!/usr/bin/perl -w $dir = "/net/klab2/u2/home/fmohammad/GFP2/run1"; opendir (DIR, $dir) or die $!; @one = readdir (DIR); foreach $file (@one) { $initialfile2 = "$file"; open FILETWO, "$initialfile2" or die "cannot open $initialfile +2 for read\n"; while ($line2 = <FILETWO>) { chomp $line2; @one=split(/\s+/, $line2); if ($one[0]=~m/^HETATM/) { if ($one[2]eq "C4") { $xC4=$one[6]; $yC4=$one[7]; $zC4=$one[8]; } if ($one[2]eq "C1") { $xC1=$one[6]; $yC1=$one[7]; $zC1=$one[8]; } if ($one[2]eq "C13") { $xC13=$one[6]; $yC13=$one[7]; $zC13=$one[8]; } } } close FILETWO; $initialfile = "1GFL.pdb"; open FILEONE, "$initialfile" or die "cannot open $initialfile +for read\n"; while ($line = <FILEONE>) { chomp $line; @two=split(/\s+/, $line); if ($two[0]=~m/^ATOM/) { if ($two[1]eq "479") { $xS=$two[6]; $yS=$two[7]; $zS=$two[8]; } if ($two[1]eq "484") { $xY=$two[6]; $yY=$two[7]; $zY=$two[8]; } if ($two[1]eq "496") { $xG=$two[6]; $yG=$two[7]; $zG=$two[8]; } } } close FILEONE; $part1 = ((($xC1 - $xS)**2) + (($yC1 - $yS)**2) + (($zC1 - $zS +)**2)); $part2 = ((($xC4 - $xY)**2) + (($yC4 - $yY)**2) + (($zC4 - $zY +)**2)); $part3 = ((($xC13 - $xG)**2) + (($yC13 - $yG)**2) + (($zC13 - +$zG)**2)); $sum = $part1 + $part2 + $part3; $sum1 = $sum / 3; $rmsd = sqrt ($sum1); print"$rmsd\n"; }

      and here is the error:

      Use of uninitialized value $xC1 in subtraction (-) at getrmsd2.pl line + 80. Use of uninitialized value $yC1 in subtraction (-) at getrmsd2.pl line + 80. Use of uninitialized value $zC1 in subtraction (-) at getrmsd2.pl line + 80. Use of uninitialized value $xC4 in subtraction (-) at getrmsd2.pl line + 81. Use of uninitialized value $yC4 in subtraction (-) at getrmsd2.pl line + 81. Use of uninitialized value $zC4 in subtraction (-) at getrmsd2.pl line + 81. Use of uninitialized value $xC13 in subtraction (-) at getrmsd2.pl lin +e 82. Use of uninitialized value $yC13 in subtraction (-) at getrmsd2.pl lin +e 82. Use of uninitialized value $zC13 in subtraction (-) at getrmsd2.pl lin +e 82. 70.8509364158301 Use of uninitialized value $xC1 in subtraction (-) at getrmsd2.pl line + 80. Use of uninitialized value $yC1 in subtraction (-) at getrmsd2.pl line + 80. Use of uninitialized value $zC1 in subtraction (-) at getrmsd2.pl line + 80. Use of uninitialized value $xC4 in subtraction (-) at getrmsd2.pl line + 81. Use of uninitialized value $yC4 in subtraction (-) at getrmsd2.pl line + 81. Use of uninitialized value $zC4 in subtraction (-) at getrmsd2.pl line + 81. Use of uninitialized value $xC13 in subtraction (-) at getrmsd2.pl lin +e 82. Use of uninitialized value $yC13 in subtraction (-) at getrmsd2.pl lin +e 82. Use of uninitialized value $zC13 in subtraction (-) at getrmsd2.pl lin +e 82. 70.8509364158301 1.68309110468408

        Well, this means that for example the variable $xC1 didn't get set to any value when you use it in line 80. Now lets look at the lines where you want to set the variable. It is in a part that is executed only when two if-clauses are true.

        The warnings indicate that all variables that depend on the outer if-clause "if ($one[0]=~m/^HETATM/)" have the same problem. So this if-clause probably is never true for at least the first two executions of the outermost loop i.e. for the first two files (and it seems from your test output that there is a third file where the if-clause is successfully run through). Because of that the variables are empty and you get the warnings.

        So you have to change your script so that when the information you seek is not in the file the rest of the loop isn't executed anymore.

        For example:

        foreach $file (@one) { my $allfound=0; ... if ($one[2]eq "C4") { $allfound&=1; $xC4=$one[6]; $yC4=$one[7]; $zC4=$one[8]; } if ($one[2]eq "C1") { $allfound&=2; $xC1=$one[6]; $yC1=$one[7]; $zC1=$one[8]; } if ($one[2]eq "C13") { $allfound&=4; $xC13=$one[6]; $yC13=$one[7]; $zC13=$one[8]; } } } close FILETWO; next if ($allfound!=7);

        I'm using the first three bits in the variable $allfound to tell me if all if-clauses where run through at least once for this file.

Re: Hello im new to programming i need help
by 7stud (Deacon) on Feb 10, 2013 at 01:23 UTC
    @one = readdir (DIR);

    If anyone else in the world is going to read your code, e.g. you post your code on a programming forum, than you have to use descriptive variable names. If I told you that I had an array named @ten, and I asked you to guess what each element of the array was, what would be your guess? What are the odds you would be correct?

    Your @one array contains file names, so how about naming it @file_names or @fnames?

    In your opinion, what is the difference between these two lines of code:

    $initialfile2 = $file; $initialfile2 = "$file";
      sorry im really new at this. i was getting help at school. but now im at home ant dont know what to do. Both @one and @two are grabbing data from a (.pdb) file and them im isolating the x y and z coordinates. i dont know if im explaining correctly.
Re: Hello im new to programming i need help
by 7stud (Deacon) on Feb 10, 2013 at 01:40 UTC

    Okay, the first thing you should do is construct a very small example. Your example should read two files only, and each file should only have two lines in it. Then you need to post those two files and try to describe what you are doing. For instance, "Given these two files, I want to produce this output....".

    But really, when you are starting out programming, none of your programs should be longer than 10 lines of code until you learn the basics. Your code also needs to be readable, and yours isn't because sometimes you indent your code and other times you don't. In addition, perl indenting is 4 spaces, so you should configure your computer programming editor to indent 4 spaces. If you are not using a computer programming editor, that is your first mistake. There are many free computer programming editors which you can download, and they will automatically indent your code, and provide syntax highlighting, etc. which makes typing, editing, and debugging your code much easier.

    This error:

    Use of uninitialized value $xC1 in subtraction (-) at getrmsd2.pl line + 80.
    is pretty easy to understand. It is saying that you never assigned a value to $xC1.
Re: Hello im new to programming i need help
by Kenosis (Priest) on Feb 10, 2013 at 05:51 UTC

    You've been given some excellent suggestions. One major issue is that your PDB records are fixed-width (see Coordinate File Description (PDB Format), so splitting shouldn't be used to obtain the fields.

    The code below has a subroutine that uses unpack to get the fields you want. You'll also see use strict; use warnings;;always have these in your programs. Lexically-scoped variables (my) are also used:

    #!/usr/bin/perl -w use strict; use warnings; my ($xC4, $yC4, $zC4, $xC1, $yC1, $zC1, $xC13, $yC13, $zC13); my ($xS, $yS, $zS, $xY, $yY, $zY, $xG, $yG, $zG); #$dir = "/net/klab2/u2/home/fmohammad/GFP2/run1"; #opendir (DIR, $dir) or die $!; #@one = readdir (DIR); #foreach $file (@one) #{ my $initialfile2 = "UM_802_R203E112_1GFL_clean_gfp_5.pdb"; open my $FILETWO, '<', $initialfile2 or die "Cannot open $initialfile2 for read: $!\n"; while ( my $line = <$FILETWO> ) { next unless $line =~ m/^HETATM/; my ( $serial, $atom, $xCoord, $yCoord, $zCoord ) = getRecData($lin +e); if ( $atom eq "C4" ) { $xC4 = $xCoord; $yC4 = $yCoord; $zC4 = $zCoord; } if ( $atom eq "C1" ) { $xC1 = $xCoord; $yC1 = $yCoord; $zC1 = $zCoord; } if ( $atom eq "C13" ) { $xC13 = $xCoord; $yC13 = $yCoord; $zC13 = $zCoord; } } close $FILETWO; my $initialfile = "1GFL.pdb"; open my $FILEONE, '<', $initialfile or die "Cannot open $initialfile for read: $!\n"; while ( my $line = <$FILEONE> ) { next unless $line =~ m/^ATOM/; my ( $serial, $atom, $xCoord, $yCoord, $zCoord ) = getRecData($lin +e); if ( $serial == 479 ) { $xS = $xCoord; $yS = $yCoord; $zS = $zCoord; } if ( $serial == 484 ) { $xY = $xCoord; $yY = $yCoord; $zY = $zCoord; } if ( $serial == 496 ) { $xG = $xCoord; $yG = $yCoord; $zG = $zCoord; } } close $FILEONE; my $part1 = ( ( ( $xC1 - $xS )**2 ) + ( ( $yC1 - $yS )**2 ) + ( ( $zC1 - $zS )** +2 ) ); my $part2 = ( ( ( $xC4 - $xY )**2 ) + ( ( $yC4 - $yY )**2 ) + ( ( $zC4 - $zY )** +2 ) ); my $part3 = ( ( ( $xC13 - $xG )**2 ) + ( ( $yC13 - $yG )**2 ) + ( ( $zC13 - $zG +)**2 ) ); my $sum = $part1 + $part2 + $part3; my $sum1 = $sum / 3; my $rmsd = sqrt($sum1); print "$rmsd\n"; #} sub getRecData { return map { s/\s//g; $_ } ( unpack 'a6 a5 a1 a4 a14 a8 a8 a8', $_[0] )[ 1, 3, 5, 6, 7 ]; }

    I don't have the data sets your working with, so this hasn't been tested with files. However, the subroutine has been tested on both ATOM and HETATM records, and does return the fields you want.

    I hope this gets you closer to a fully working script.

    Update: Bio::PDB::Structure is a module you can use to parse the pdb files. However, when using it, field values were returned that needed to be cleaned--as if it just returns an entire field at its fixed-width. Not a lot 'out there' yet on using this module (it's still young). Given this, I don't think the module provides an advantage, in your case.

Re: Hello im new to programming i need help
by sundialsvc4 (Monsignor) on Feb 10, 2013 at 19:42 UTC

    If you are “new to programming,” then the first and perhaps most important tool that I would use is a number-two pencil and a legal pad of paper.   Write down, in longhand, what your input file looks like, what decisions need to be made by the program, what the math is, and generally, in your own words, what this program needs to do.   Write it as though you were giving detailed instructions to another person, and try hard to leave no detail un-mentioned.   Next, walk-through the procedure, exactly as you have written it, to be sure that each of the cases are thoroughly and correctly described.   A person who knows nothing at all about what he is doing, who knows nothing about the context of the problem being solved, should be able to sit down in a kitchen with your recipe and produce an acceptable dish.

    Now ... take this pad of paper and go into the kitchen.   The tools in that kitchen (Perl...) are obviously powerful and serviceable, but they are unfamiliar to you (of course).   However, now you are thoroughly prepared:   there is no question remaining in your mind of what to do; the only challenge now is to figure out how to make this magical, thoroughly-automatic kitchen produce the dish.   You know that you will be bumping into lots of unfamiliar areas concerning how to make the equipment work, but you know exactly what the goal is and exactly how the kitchen will go about doing it.

    These are two separate concerns, and actually, the first one ... not the second! ... is by far most-significant to someone who is “new to programming.”

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1017993]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2014-07-10 05:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (199 votes), past polls