http://www.perlmonks.org?node_id=1018026


in reply to Hello im new to programming i need help

You've been given some excellent suggestions. One major issue is that your PDB records are fixed-width (see Coordinate File Description (PDB Format), so splitting shouldn't be used to obtain the fields.

The code below has a subroutine that uses unpack to get the fields you want. You'll also see use strict; use warnings;;always have these in your programs. Lexically-scoped variables (my) are also used:

#!/usr/bin/perl -w use strict; use warnings; my ($xC4, $yC4, $zC4, $xC1, $yC1, $zC1, $xC13, $yC13, $zC13); my ($xS, $yS, $zS, $xY, $yY, $zY, $xG, $yG, $zG); #$dir = "/net/klab2/u2/home/fmohammad/GFP2/run1"; #opendir (DIR, $dir) or die $!; #@one = readdir (DIR); #foreach $file (@one) #{ my $initialfile2 = "UM_802_R203E112_1GFL_clean_gfp_5.pdb"; open my $FILETWO, '<', $initialfile2 or die "Cannot open $initialfile2 for read: $!\n"; while ( my $line = <$FILETWO> ) { next unless $line =~ m/^HETATM/; my ( $serial, $atom, $xCoord, $yCoord, $zCoord ) = getRecData($lin +e); if ( $atom eq "C4" ) { $xC4 = $xCoord; $yC4 = $yCoord; $zC4 = $zCoord; } if ( $atom eq "C1" ) { $xC1 = $xCoord; $yC1 = $yCoord; $zC1 = $zCoord; } if ( $atom eq "C13" ) { $xC13 = $xCoord; $yC13 = $yCoord; $zC13 = $zCoord; } } close $FILETWO; my $initialfile = "1GFL.pdb"; open my $FILEONE, '<', $initialfile or die "Cannot open $initialfile for read: $!\n"; while ( my $line = <$FILEONE> ) { next unless $line =~ m/^ATOM/; my ( $serial, $atom, $xCoord, $yCoord, $zCoord ) = getRecData($lin +e); if ( $serial == 479 ) { $xS = $xCoord; $yS = $yCoord; $zS = $zCoord; } if ( $serial == 484 ) { $xY = $xCoord; $yY = $yCoord; $zY = $zCoord; } if ( $serial == 496 ) { $xG = $xCoord; $yG = $yCoord; $zG = $zCoord; } } close $FILEONE; my $part1 = ( ( ( $xC1 - $xS )**2 ) + ( ( $yC1 - $yS )**2 ) + ( ( $zC1 - $zS )** +2 ) ); my $part2 = ( ( ( $xC4 - $xY )**2 ) + ( ( $yC4 - $yY )**2 ) + ( ( $zC4 - $zY )** +2 ) ); my $part3 = ( ( ( $xC13 - $xG )**2 ) + ( ( $yC13 - $yG )**2 ) + ( ( $zC13 - $zG +)**2 ) ); my $sum = $part1 + $part2 + $part3; my $sum1 = $sum / 3; my $rmsd = sqrt($sum1); print "$rmsd\n"; #} sub getRecData { return map { s/\s//g; $_ } ( unpack 'a6 a5 a1 a4 a14 a8 a8 a8', $_[0] )[ 1, 3, 5, 6, 7 ]; }

I don't have the data sets your working with, so this hasn't been tested with files. However, the subroutine has been tested on both ATOM and HETATM records, and does return the fields you want.

I hope this gets you closer to a fully working script.

Update: Bio::PDB::Structure is a module you can use to parse the pdb files. However, when using it, field values were returned that needed to be cleaned--as if it just returns an entire field at its fixed-width. Not a lot 'out there' yet on using this module (it's still young). Given this, I don't think the module provides an advantage, in your case.