http://www.perlmonks.org?node_id=890461

wanttoprogram has asked for the wisdom of the Perl Monks concerning the following question:

Hi This is my 2nd perl program. I do not know all basics. I want some help with following progra. Any help is appreciated. I want to open two text files and grab two substrings in two opened files. I want to grab another value from first opened file as well. Now I want to print the value from first file repeatedly as long as the two substrings from two files match.I wrote some thing like this. The program i managed to write gives only one value from MYFILE and prints it repeatedly in the new file.
#!/usr/bin/perl -w use strict; #crappy open (MYFILE, "2hgs_d00_internal_nrg_e.dat"); our $nrgval = " "; our $chn = " "; our $count; #our $chn[$count]; our $count2; #our $chn2[$count2] our @nrg; open (NEWF, "2HGS_bio_conv-min_p.pdb"); our $toprint = " "; our $chn2 = " "; while (<MYFILE>) { chomp; # avoid \n at the end of each line if ($_ =~/ENERGY/){ for($count=1;$count<=1;$count++){ $chn = substr $_, 20, 3; $nrgval = substr $_, 35, 8; while (<NEWF>) { chomp; # avoid \n at the end of each line our $j = 0; our $i = 0; if ($_ =~/ATOM/){ for($count2=1;$count2<=1;$count2++){ $chn2 = substr $_, 23, 3; $toprint = substr $_, 0, 65; for($chn=1;$chn<=$chn2;$chn++){ if ($chn==$chn2){ print " $toprint $nrgval \n"; } } } } } } } } close (MYFILE); close (NEWF);

Replies are listed 'Best First'.
Re: incrementing already existing file
by roboticus (Chancellor) on Feb 28, 2011 at 00:42 UTC

    wanttoprogram:

    If you keep your indentation and other whitespace consistent and clean, it's easier to see errors in your code. Additionally, I like to declare variables where I need them, and I don't create variables I don't use. ;^) (In other words, I deleted a few variables that you weren't using.) So I altered your code a bit, like this:

    #!/usr/bin/perl -w use strict; open (MYFILE, "2hgs_d00_internal_nrg_e.dat"); open (NEWF, "2HGS_bio_conv-min_p.pdb"); while (<MYFILE>) { chomp; # avoid \n at the end of each line if ($_ =~/ENERGY/) { for(my $count=1;$count<=1;$count++){ my $chn = substr $_, 20, 3; my $nrgval = substr $_, 35, 8; while (<NEWF>) { chomp; # avoid \n at the end of each line if ($_ =~/ATOM/){ for(my $count2=1;$count2<=1;$count2++){ my $chn2 = substr $_, 23, 3; my $toprint = substr $_, 0, 65; for($chn=1;$chn<=$chn2;$chn++){ if ($chn==$chn2){ print " $toprint $nrgval \n"; } } } } } } } }

    Having done that, it's a bit easier to see why you get only one value from MYFILE. You read from it in the outermost loop, and then process the entire NEWF file. Then, when it's time to read the second record from MYFILE, your NEWF is empty, so it completely skips the inner loop from then on.

    Generally, if your code just creeps rightward like this, it's indicative of a problem of some sort. I'm normally uncomfortable with more than, say, four levels of indentation. Beyond that, I tend to either change my logic, or pull out some subroutines to simplify things.

    One last thing: You have some strange loops in the form:

    for(count2=1;$count2<=1;$count2++){ #stuff }

    You know that the loop should execute only one time, right? I would normally assume you meant something else and just keyed in the wrong thing. But since you have it repeated I thought I'd point it out to you. For example, try this program out:

    #!/usr/bin/perl for (my $count2=1; $count2<=1; $count2++) { print "Count2: $count2\n"; }

    You should review how loops work, and then change the logic to do more of what you want. If you're wanting to work through both files in parallel, you might want to check out the logic in Re: How to deal with Huge data and/or Re: parallel reading.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I am getting only one value repeatedly as shown below. I want it repeat only till column is same and then give a different #. It should look like the next output. The output I am getting is:
      ATOM 1 N MET 1 4.440 25.987 -14.585 1.00 0.0 28.8 +216 ATOM 2 HT1 MET 1 4.524 25.860 -15.614 1.00 0.0 28. +8216 ATOM 3 HT2 MET 1 4.109 26.952 -14.383 1.00 0.0 28. +8216 ATOM 4 HT3 MET 1 3.729 25.316 -14.228 1.00 0.0 28. +8216 ATOM 5 CA MET 1 5.708 25.747 -13.831 1.00 0.0 28. +8216 ATOM 6 HA MET 1 6.008 24.725 -14.011 1.00 0.0 28. +8216 ATOM 7 CB MET 1 6.792 26.728 -14.367 1.00 0.0 28. +8216 ATOM 8 HB1 MET 1 6.812 26.659 -15.475 1.00 0.0 28. +8216 ATOM 9 HB2 MET 1 6.502 27.769 -14.109 1.00 0.0 28. +8216 ATOM 10 CG MET 1 8.241 26.517 -13.880 1.00 0.0 28. +8216 ATOM 11 HG1 MET 1 8.298 26.709 -12.787 1.00 0.0 28. +8216 ATOM 12 HG2 MET 1 8.547 25.462 -14.049 1.00 0.0 28. +8216 ATOM 13 SD MET 1 9.409 27.618 -14.738 1.00 0.0 28. +8216 ATOM 14 CE MET 1 10.824 27.291 -13.650 1.00 0.0 28. +8216 ATOM 15 HE1 MET 1 11.740 27.794 -14.026 1.00 0.0 28. +8216 ATOM 16 HE2 MET 1 10.631 27.662 -12.621 1.00 0.0 28. +8216 ATOM 17 HE3 MET 1 11.042 26.203 -13.587 1.00 0.0 28. +8216 ATOM 18 C MET 1 5.446 25.905 -12.332 1.00 0.0 28. +8216 ATOM 19 O MET 1 4.414 26.443 -11.925 1.00 0.0 28. +8216 ATOM 20 N ALA 2 6.330 25.384 -11.469 1.00 0.0 28. +8216 ATOM 21 HN ALA 2 7.105 24.825 -11.751 1.00 0.0 28. +8216 ATOM 22 CA ALA 2 6.383 25.717 -10.067 1.00 0.0 28. +8216 ATOM 23 HA ALA 2 6.344 26.791 -9.955 1.00 0.0 28. +8216 ATOM 24 CB ALA 2 5.300 25.034 -9.205 1.00 0.0 28. +8216 ATOM 25 HB1 ALA 2 4.288 25.319 -9.565 1.00 0.0 28. +8216 ATOM 26 HB2 ALA 2 5.394 23.928 -9.255 1.00 0.0 28. +8216 ATOM 27 HB3 ALA 2 5.396 25.346 -8.143 1.00 0.0 28. +8216 ATOM 28 C ALA 2 7.753 25.238 -9.659 1.00 0.0 28. +8216 ATOM 29 O ALA 2 8.299 24.357 -10.317 1.00 0.0 28. +8216 ATOM 30 N THR 3 8.353 25.813 -8.605 1.00 86.2 28. +8216 ATOM 31 HN THR 3 7.908 26.533 -8.079 1.00 0.0 28. +8216 ATOM 32 CA THR 3 9.687 25.408 -8.176 1.00 88.8 28. +8216 ATOM 33 HA THR 3 9.829 24.356 -8.373 1.00 0.0 28. +8216 ATOM 34 CB THR 3 10.847 26.194 -8.810 1.00 91.6 28. +8216 ATOM 35 HB THR 3 11.790 25.982 -8.261 1.00 0.0 28. +8216 ATOM 36 OG1 THR 3 10.614 27.598 -8.833 1.00 93.2 28. +8216 I should get output : ATOM 1 N MET 1 4.440 25.987 -14.585 1.00 0.0 28.8 +216 ATOM 2 HT1 MET 1 4.524 25.860 -15.614 1.00 0.0 28. +8216 ATOM 3 HT2 MET 1 4.109 26.952 -14.383 1.00 0.0 28. +8216 ATOM 4 HT3 MET 1 3.729 25.316 -14.228 1.00 0.0 28. +8216 ATOM 5 CA MET 1 5.708 25.747 -13.831 1.00 0.0 28. +8216 ATOM 6 HA MET 1 6.008 24.725 -14.011 1.00 0.0 28. +8216 ATOM 7 CB MET 1 6.792 26.728 -14.367 1.00 0.0 28. +8216 ATOM 8 HB1 MET 1 6.812 26.659 -15.475 1.00 0.0 28. +8216 ATOM 9 HB2 MET 1 6.502 27.769 -14.109 1.00 0.0 28. +8216 ATOM 10 CG MET 1 8.241 26.517 -13.880 1.00 0.0 28. +8216 ATOM 11 HG1 MET 1 8.298 26.709 -12.787 1.00 0.0 28. +8216 ATOM 12 HG2 MET 1 8.547 25.462 -14.049 1.00 0.0 28. +8216 ATOM 13 SD MET 1 9.409 27.618 -14.738 1.00 0.0 28. +8216 ATOM 14 CE MET 1 10.824 27.291 -13.650 1.00 0.0 28. +8216 ATOM 15 HE1 MET 1 11.740 27.794 -14.026 1.00 0.0 28. +8216 ATOM 16 HE2 MET 1 10.631 27.662 -12.621 1.00 0.0 28. +8216 ATOM 17 HE3 MET 1 11.042 26.203 -13.587 1.00 0.0 28. +8216 ATOM 18 C MET 1 5.446 25.905 -12.332 1.00 0.0 28. +8216 ATOM 19 O MET 1 4.414 26.443 -11.925 1.00 0.0 28. +8216 ATOM 20 N ALA 2 6.330 25.384 -11.469 1.00 0.0 24. +9274 ATOM 21 HN ALA 2 7.105 24.825 -11.751 1.00 0.0 24. +9274 ATOM 22 CA ALA 2 6.383 25.717 -10.067 1.00 0.0 24. +9274 ATOM 23 HA ALA 2 6.344 26.791 -9.955 1.00 0.0 24. +9274 ATOM 24 CB ALA 2 5.300 25.034 -9.205 1.00 0.0 24. +9274 ATOM 25 HB1 ALA 2 4.288 25.319 -9.565 1.00 0.0 24. +9274 ATOM 26 HB2 ALA 2 5.394 23.928 -9.255 1.00 0.0 24. +9274 ATOM 27 HB3 ALA 2 5.396 25.346 -8.143 1.00 0.0 24. +9274 ATOM 28 C ALA 2 7.753 25.238 -9.659 1.00 0.0 24. +9274 ATOM 29 O ALA 2 8.299 24.357 -10.317 1.00 0.0 24. +9274 ATOM 30 N THR 3 8.353 25.813 -8.605 1.00 86.2 19. +0884 ATOM 31 HN THR 3 7.908 26.533 -8.079 1.00 0.0 19. +0884 ATOM 32 CA THR 3 9.687 25.408 -8.176 1.00 88.8 19. +0884 ATOM 33 HA THR 3 9.829 24.356 -8.373 1.00 0.0 19. +0884 ATOM 34 CB THR 3 10.847 26.194 -8.810 1.00 91.6 19. +0884 ATOM 35 HB THR 3 11.790 25.982 -8.261 1.00 0.0 19. +0884 ATOM 36 OG1 THR 3 10.614 27.598 -8.833 1.00 93.2 19. +0884
        It looks like both files use column 5 as a "key" of sorts to connect the two files. I would approach this by reading all of the first file (the one you open as MYFILE), collecting the values from the last column along the way. Since you only need to collect one value from each line, I save then in an array as I read the file. This will work fine even for fairly large files. When the first file is processed, read from the second file (the one you open as NEWF) and do the substitutions (line by line), writing the output as we go.
        #!/usr/bin/env perl use strict; use warnings; my $file1 = "pm-890461-in1.txt"; my $file2 = "pm-890461-in2.txt"; open( MYFILE, '<', $file1 ) or die "cannot open $file1: $!"; open( NEWF, '<', $file2 ) or die "cannot open $file2: $!"; my @in_values; while ( <MYFILE> ) { chomp; my( $index, $value ) = ( split /\s+/ )[4, -1]; # above line does same thing as next three # my @fields = ( split /\s+/ ); # my $index = $fields[4]; # my $value = $fields[-1]; $in_values[ $index ] = $value; } close MYFILE; while ( <NEWF> ) { chomp; my @fields = ( split /\s+/ ); my $index = $fields[4]; $fields[-1] = $in_values[ $index ]; my $output = join "\t", @fields; print "$output\n"; } close NEWF;
        Note that I use split (not substr) to get the fields of interest from each line (same approach for both files). For the output, I join the fields with a tab character. You should change that to something else (e.g., a fixed number of space characters) if you need the output formatted differently. And of course this writes to STDOUT, so you will need to redirect the output on the command line or add to this code to open an output file and print to that.

        When you are more comfortable with Perl, you will find that some of this is actually on the "verbose" side. Using Perl idioms would make some of my code more compact, but also a bit harder to follow until you have more experience.

        wanttoprogram:

        OK, are both files sorted with respect to the key fields? If so, then you don't really want nested loops. You want a single loop and you can decide which file to read depending on what the current condition is. Something like:

        # "Prime the pump" my $rec1 = <FILE1>; my $rec2 = <FILE2>; # Keep looping as long as either file has records while (!eof(FILE1) or !eof(FILE2)) { # Figure out what keys you have my $key1 = get_key_1($rec1); my $key2 = get_key_2($rec2); if ($key1 eq $key2) { # They're the same, so create an output record, and read # next record from file2 print build_record($rec1, $rec2); $rec2 = <FILE2>; } elsif ($key1 lt $key2) { # First file has a key we don't need, just ignore # it and read the next record $rec1 = <FILE1>; } else { # Hmmm ... first file seemed to skip the key we need. # print a partial record and advance to next file2 record print partial_record($rec2); $rec2 = <FILE2>; } }

        Of course, if either of the files aren't sorted on the keys, then that won't work. You'll either have to sort them, or try something like a hash table. For the hash table, you simply read the first file into a hash based on the key field(s). Then you scan through the second file, looking up values from the hash as you need them. Something like:

        # Read dictionary my %abbreviations; while (my $line = <DATA>) { my ($abbrev,$longname) = split/:/, $line; $abbreviations{$abbrev}=$longname; } # Process file open my $FH, '<', 'the_file' or die; while (my $line = <$FH>) { my ($field1, $field2, $key, $field3) = split /\t/, $line; if (exists $abbreviations{$key}) { # key was abbreviated, replace with full value $key = $abbreviations{$key}; } print "$key: ($field1, $field2, $field3)\n"; } close $FH; __DATA__ perl:pathologically eclectic rubbish lister lisp:lots of irritating silly parenthesis python:all your space are belong to us ruby:a quack language

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: incrementing already existing file
by broomduster (Priest) on Feb 28, 2011 at 00:31 UTC
    You gave a nice abstract description of what you want to accomplish (in terms of substrings from files). We really need a more concrete description that includes short samples of each of the files (just a few lines from each). Explain what you want to happen, e.g., read a line from file1, do thus-and-so to it; then read a line from file2 and do something to it; then make such-and-such comparison; then print (or not) depending on the result. Chances are that once you explain it here well enough for us to understand, you will have a better idea of how your code is falling short of your expectations.

    As for the code you posted, here are some general comments, but they won't fix the problem(s) you tried to describe in your post.

    Firstly, I reformatted your code (nicer indentation, eliminate lots of extra white space) to make it more readable. Here it is (see below for some specific comments):

    #!/usr/bin/perl -w use strict; #crappy open (MYFILE, "2hgs_d00_internal_nrg_e.dat"); my $nrgval = " "; my $chn = " "; my $count; #my $chn[$count]; my $count2; #my $chn2[$count2] my @nrg; open (NEWF, "2HGS_bio_conv-min_p.pdb"); my $toprint = " "; my $chn2 = " "; while (<MYFILE>) { chomp; # avoid \n at the end of each line if ($_ =~/ENERGY/) { for ($count=1;$count<=1;$count++) { $chn = substr $_, 20, 3; $nrgval = substr $_, 35, 8; while (<NEWF>) { chomp; # avoid \n at the end of each line my $j = 0; my $i = 0; if ($_ =~/ATOM/) { for ($count2=1;$count2<=1;$count2++) { $chn2 = substr $_, 23, 3; $toprint = substr $_, 0, 65; for ($chn=1;$chn<=$chn2;$chn++) { if ($chn==$chn2) { print " $toprint $nrgval \n"; } } } } } } } } close (MYFILE); close (NEWF);
    Comments:
    I changed all of your uses of 'our' to 'my'. They are not interchangeable, and the details of when you really want 'our' are probably more than you need to worry about right now. Until you get more experience, stick to 'my'. Other than those changes and reformatting, the code is yours, but here are some things you should do:
    • Re-read the documentation for open. You need to specify whether files are for reading, writing, or appending. You got the behavior you wanted, but you should get in the habit of saying explicitly what you want.
    • Get in the habit of checking whether open succeeded:
      open( .... ) or die "could not open file: $!";
    • You will want to learn to use Perl's for / foreach rather than the C-style for loops that you have here. Once it's clear what you're trying to do, it will also be clear how to make things more Perl-ish. (I'm sure you are aware that two of your loops make only one pass, but I suspect you did that intentionally for debugging purposes.)
    • You have these two lines (commented out):
      #my $chn[$count]; #my $chn2[$count2]
      If you really need arrays, they would be declared like so:
      my( @chn, @chn2 );
      We'll know if you need them when we see some data and a description of what should happen.

    Those are comments about good programming practice. None of them are likely to solve your problem. Post some sample data for the two files. Include a description of what you want your code to do (both in terms of program steps and what you want the output to look like). If, in the process of doing that you get it working on your own, so much the better. If you're still having problems, at least then we have some data to test help us help you solve your remaining problems.

      <MYFILE> has columns similar to this. I want to save column # and last column value when I open this file.I tried to save in sub-strings
      IN CHAIN A RESIDUE 1 HAS ENERGY 28.8216 IN CHAIN A RESIDUE 2 HAS ENERGY 24.9274 IN CHAIN A RESIDUE 3 HAS ENERGY 19.0884 IN CHAIN A RESIDUE 4 HAS ENERGY -27.6978 IN CHAIN A RESIDUE 5 HAS ENERGY 34.8558 IN CHAIN A RESIDUE 6 HAS ENERGY 17.9725 IN CHAIN A RESIDUE 7 HAS ENERGY 29.0379 IN CHAIN A RESIDUE 8 HAS ENERGY 13.7192 IN CHAIN A RESIDUE 9 HAS ENERGY 15.3481 IN CHAIN A RESIDUE 10 HAS ENERGY -7.55393 IN CHAIN A RESIDUE 11 HAS ENERGY -5.87837 IN CHAIN A RESIDUE 12 HAS ENERGY 40.7543 IN CHAIN A RESIDUE 13 HAS ENERGY -11.5488 IN CHAIN A RESIDUE 14 HAS ENERGY -11.7673 <NEWF> has columns similar to this. I want to save column # and compar +e it with previous MYFILE vale. When they match I want to print the valve that I stored in previous fi +le in place of last but one column (bold column) ATOM 1 N MET 1 4.440 25.987 -14.585 1.00 0.00 + A ATOM 2 HT1 MET 1 4.524 25.860 -15.614 1.00 0.00 + A ATOM 3 HT2 MET 1 4.109 26.952 -14.383 1.00 0.00 + A ATOM 4 HT3 MET 1 3.729 25.316 -14.228 1.00 0.00 + A ATOM 5 CA MET 1 5.708 25.747 -13.831 1.00 0.00 + A ATOM 6 HA MET 1 6.008 24.725 -14.011 1.00 0.00 + A ATOM 7 CB MET 1 6.792 26.728 -14.367 1.00 0.00 + A ATOM 8 HB1 MET 1 6.812 26.659 -15.475 1.00 0.00 + A ATOM 9 HB2 MET 1 6.502 27.769 -14.109 1.00 0.00 + A ATOM 10 CG MET 1 8.241 26.517 -13.880 1.00 0.00 + A ATOM 11 HG1 MET 1 8.298 26.709 -12.787 1.00 0.00 + A ATOM 12 HG2 MET 1 8.547 25.462 -14.049 1.00 0.00 + A ATOM 20 N ALA 2 6.330 25.384 -11.469 1.00 0.00 + A ATOM 21 HN ALA 2 7.105 24.825 -11.751 1.00 0.00 + A ATOM 22 CA ALA 2 6.383 25.717 -10.067 1.00 0.00 + A ATOM 23 HA ALA 2 6.344 26.791 -9.955 1.00 0.00 + A ATOM 24 CB ALA 2 5.300 25.034 -9.205 1.00 0.00 + A ATOM 25 HB1 ALA 2 4.288 25.319 -9.565 1.00 0.00 + A ATOM 26 HB2 ALA 2 5.394 23.928 -9.255 1.00 0.00 + A ATOM 27 HB3 ALA 2 5.396 25.346 -8.143 1.00 0.00 + A ATOM 28 C ALA 2 7.753 25.238 -9.659 1.00 0.00 + A ATOM 29 O ALA 2 8.299 24.357 -10.317 1.00 0.00 + A ATOM 30 N THR 3 8.353 25.813 -8.605 1.00 86.23 + A ATOM 31 HN THR 3 7.908 26.533 -8.079 1.00 0.00 + A ATOM 32 CA THR 3 9.687 25.408 -8.176 1.00 88.87 + A ATOM 33 HA THR 3 9.829 24.356 -8.373 1.00 0.00 + A ATOM 34 CB THR 3 10.847 26.194 -8.810 1.00 91.62 + A ATOM 35 HB THR 3 11.790 25.982 -8.261 1.00 0.00 + A
        OK so far, I think. But you need to be a bit clearer.
        • When you say "column #" what do you mean exactly?
        • Given the first few lines of each file (say five lines), what should the output look like? Not just a description, but please show the actual output you want to create.
        It looks like you are trying to add energy value of each residue in one file to a PDB file, is this what you are trying to accomplish?