Re^2: incrementing already existing file

I am getting The output I am getting is:

+216 ATOM      2  HT1 MET     1 +8216 ATOM      3  HT2 MET     1 +8216 ATOM      4  HT3 MET     1 +8216 ATOM      5  CA  MET     1 +8216 ATOM      6  HA  MET     1 +8216 ATOM      7  CB  MET     1 +8216 ATOM      8  HB1 MET     1 +8216 ATOM      9  HB2 MET     1 +8216 ATOM     10  CG  MET     1 +8216 ATOM     11  HG1 MET     1 +8216 ATOM     12  HG2 MET     1 +8216 ATOM     13  SD  MET     1 +8216 ATOM     14  CE  MET     1 +8216 ATOM     15  HE1 MET     1 +8216 ATOM     16  HE2 MET     1 +8216 ATOM     17  HE3 MET     1 +8216 ATOM     18  C   MET     1 +8216 ATOM     19  O   MET     1 +8216 ATOM     20  N   ALA     2 +8216 ATOM     21  HN  ALA     2 +8216 ATOM     22  CA  ALA     2 +8216 ATOM     23  HA  ALA     2 +8216 ATOM     24  CB  ALA     2 +8216 ATOM     25  HB1 ALA     2 +8216 ATOM     26  HB2 ALA     2 +8216 ATOM     27  HB3 ALA     2 +8216 ATOM     28  C   ALA     2 +8216 ATOM     29  O   ALA     2 +8216 ATOM     30  N   THR     3 +8216 ATOM     31  HN  THR     3 +8216 ATOM     32  CA  THR     3 +8216 ATOM     33  HA  THR     3 +8216 ATOM     34  CB  THR     3 +8216 ATOM     35  HB  THR     3 +8216 ATOM     36  OG1 THR     3 +8216

I should get output : ATOM      1  N   MET     1 +216 ATOM      2  HT1 MET     1 +8216 ATOM      3  HT2 MET     1 +8216 ATOM      4  HT3 MET     1 +8216 ATOM      5  CA  MET     1 +8216 ATOM      6  HA  MET     1 +8216 ATOM      7  CB  MET     1 +8216 ATOM      8  HB1 MET     1 +8216 ATOM      9  HB2 MET     1 +8216 ATOM     10  CG  MET     1 +8216 ATOM     11  HG1 MET     1 +8216 ATOM     12  HG2 MET     1 +8216 ATOM     13  SD  MET     1 +8216 ATOM     14  CE  MET     1 +8216 ATOM     15  HE1 MET     1 +8216 ATOM     16  HE2 MET     1 +8216 ATOM     17  HE3 MET     1 +8216 ATOM     18  C   MET     1 +8216 ATOM     19  O   MET     1 +8216 ATOM     20  N   ALA     2 +9274 ATOM     21  HN  ALA     2 +9274 ATOM     22  CA  ALA     2 +9274 ATOM     23  HA  ALA     2 +9274 ATOM     24  CB  ALA     2 +9274 ATOM     25  HB1 ALA     2 +9274 ATOM     26  HB2 ALA     2 +9274 ATOM     27  HB3 ALA     2 +9274 ATOM     28  C   ALA     2 +9274 ATOM     29  O   ALA     2 +9274 ATOM     30  N   THR     3 +0884 ATOM     31  HN  THR     3 +0884 ATOM     32  CA  THR     3 +0884 ATOM     33  HA  THR     3 +0884 ATOM     34  CB  THR     3 +0884 ATOM     35  HB  THR     3 +0884 ATOM     36  OG1 THR     3 +0884 
 4.524  25.860 -15.614  1.00  0.0 28. 4.109  26.952 -14.383  1.00  0.0 28. 3.729  25.316 -14.228  1.00  0.0 28. 5.708  25.747 -13.831  1.00  0.0 28. 6.008  24.725 -14.011  1.00  0.0 28. 6.792  26.728 -14.367  1.00  0.0 28. 6.812  26.659 -15.475  1.00  0.0 28. 6.502  27.769 -14.109  1.00  0.0 28. 8.241  26.517 -13.880  1.00  0.0 28. 8.298  26.709 -12.787  1.00  0.0 28. 8.547  25.462 -14.049  1.00  0.0 28. 9.409  27.618 -14.738  1.00  0.0 28. 10.824  27.291 -13.650  1.00  0.0 28. 11.740  27.794 -14.026  1.00  0.0 28. 10.631  27.662 -12.621  1.00  0.0 28. 11.042  26.203 -13.587  1.00  0.0 28. 5.446  25.905 -12.332  1.00  0.0 28. 4.414  26.443 -11.925  1.00  0.0 28. 6.330  25.384 -11.469  1.00  0.0 28. 7.105  24.825 -11.751  1.00  0.0 28. 6.383  25.717 -10.067  1.00  0.0 28. 6.344  26.791  -9.955  1.00  0.0 28. 5.300  25.034  -9.205  1.00  0.0 28. 4.288  25.319  -9.565  1.00  0.0 28. 5.394  23.928  -9.255  1.00  0.0 28. 5.396  25.346  -8.143  1.00  0.0 28. 7.753  25.238  -9.659  1.00  0.0 28. 8.299  24.357 -10.317  1.00  0.0 28. 8.353  25.813  -8.605  1.00 86.2 28. 7.908  26.533  -8.079  1.00  0.0 28. 9.687  25.408  -8.176  1.00 88.8 28. 9.829  24.356  -8.373  1.00  0.0 28. 10.847  26.194  -8.810  1.00 91.6 28. 11.790  25.982  -8.261  1.00  0.0 28. 10.614  27.598  -8.833  1.00 93.2 28. 4.440  25.987 -14.585  1.00  0.0 28.8 4.524  25.860 -15.614  1.00  0.0 28. 4.109  26.952 -14.383  1.00  0.0 28. 3.729  25.316 -14.228  1.00  0.0 28. 5.708  25.747 -13.831  1.00  0.0 28. 6.008  24.725 -14.011  1.00  0.0 28. 6.792  26.728 -14.367  1.00  0.0 28. 6.812  26.659 -15.475  1.00  0.0 28. 6.502  27.769 -14.109  1.00  0.0 28. 8.241  26.517 -13.880  1.00  0.0 28. 8.298  26.709 -12.787  1.00  0.0 28. 8.547  25.462 -14.049  1.00  0.0 28. 9.409  27.618 -14.738  1.00  0.0 28. 10.824  27.291 -13.650  1.00  0.0 28. 11.740  27.794 -14.026  1.00  0.0 28. 10.631  27.662 -12.621  1.00  0.0 28. 11.042  26.203 -13.587  1.00  0.0 28. 5.446  25.905 -12.332  1.00  0.0 28. 4.414  26.443 -11.925  1.00  0.0 28. 6.330  25.384 -11.469  1.00  0.0 24. 7.105  24.825 -11.751  1.00  0.0 24. 6.383  25.717 -10.067  1.00  0.0 24. 6.344  26.791  -9.955  1.00  0.0 24. 5.300  25.034  -9.205  1.00  0.0 24. 4.288  25.319  -9.565  1.00  0.0 24. 5.394  23.928  -9.255  1.00  0.0 24. 5.396  25.346  -8.143  1.00  0.0 24. 7.753  25.238  -9.659  1.00  0.0 24. 8.299  24.357 -10.317  1.00  0.0 24. 8.353  25.813  -8.605  1.00 86.2 19. 7.908  26.533  -8.079  1.00  0.0 19. 9.687  25.408  -8.176  1.00 88.8 19. 9.829  24.356  -8.373  1.00  0.0 19. 10.847  26.194  -8.810  1.00 91.6 19. 11.790  25.982  -8.261  1.00  0.0 19. 10.614  27.598  -8.833  1.00 93.2 19. class='embed-code-dl'>[download]

notes" style="text-align:center">

Comment on Re^2: incrementing already existing file Download Code

Replies are listed 'Best First'.
Re^3: incrementing already existing file by broomduster (Priest) on Feb 28, 2011 at 03:20 UTC
It looks like both files use column 5 as a "key" of sorts to connect the two files. I would approach this by reading all of the first file (the one you open as MYFILE), collecting the values from the last column along the way. Since you only need to collect one value from each line, I save then in an array as I read the file. This will work fine even for fairly large files. When the first file is processed, read from the second file (the one you open as NEWF) and do the substitutions (line by line), writing the output as we go. #!/usr/bin/env perl use strict; use warnings; my $file1 = "pm-890461-in1.txt"; my $file2 = "pm-890461-in2.txt"; open( MYFILE, '<', $file1 ) or die "cannot open $file1: $!"; open( NEWF, '<', $file2 ) or die "cannot open $file2: $!"; my @in_values; while ( <MYFILE> ) { chomp; my( $index, $value ) = ( split /\s+/ )[4, -1]; # above line does same thing as next three # my @fields = ( split /\s+/ ); # my $index = $fields[4]; # my $value = $fields[-1]; $in_values[ $index ] = $value; } close MYFILE; while ( <NEWF> ) { chomp; my @fields = ( split /\s+/ ); my $index = $fields[4]; $fields[-1] = $in_values[ $index ]; my $output = join "\t", @fields; print "$output\n"; } close NEWF; [download] Note that I use split (not substr) to get the fields of interest from each line (same approach for both files). For the output, I join the fields with a tab character. You should change that to something else (e.g., a fixed number of space characters) if you need the output formatted differently. And of course this writes to STDOUT, so you will need to redirect the output on the command line or add to this code to open an output file and print to that. When you are more comfortable with Perl, you will find that some of this is actually on the "verbose" side. Using Perl idioms would make some of my code more compact, but also a bit harder to follow until you have more experience.	[reply] [d/l]
Re^4: incrementing already existing file by wanttoprogram (Novice) on Feb 28, 2011 at 22:57 UTC
Thank you very much. The code worked perfectly well. But I have one last issue. There are two sets of values 'A' and 'B'. The code you gave me is considering B set values only. Is there any way I can ask it to look for A first and then move to B. Thank you again. It was very helpful.	[reply]
Re^5: incrementing already existing file by broomduster (Priest) on Mar 01, 2011 at 00:04 UTC
There are two sets of values 'A' and 'B'. The code you gave me is considering B set values only. Is there any way I can ask it to look for A first and then move to B. Almost certainly can. But you need to explain better what 'A' and 'B' are and show some examples of the input files. It sounds as if your data from NEWF have 'A' and 'B' somewhere on each line. 'A' should be replaced by a value from your MYFILE and 'B' should be replaced by a value from SOME_OTHER_FILE. This will be an easy modification if 'A' and 'B' are the last two columns in NEWF. If that is correct, here's how to proceed: Read from SOME_OTHER_FILE exactly as I do from MYFILE, collecting the 'B' values in an array. This assumes that SOME_OTHER_FILE and MYFILE are formatted similarly. Change the `while ( <NEWF> )` loop to replace the last two fields with the values you collected from the other two files. HINT: you will use the indices `-2` and `-1` to access those elements. I think you should try to write this code yourself. If I made some wrong assumptions, post back with clarification and (most important) samples of the input files and a sample of what the output should look like. I'm quite happy to help you get your work done, but you will learn best if you give it a go on your own and then ask for help if something doesn't work the way you want.	[reply] [d/l] [select]
Re^6: incrementing already existing file by wanttoprogram (Novice) on Mar 01, 2011 at 21:25 UTC
Re^7: incrementing already existing file by broomduster (Priest) on Mar 01, 2011 at 23:35 UTC
Some notes below your chosen depth have not been shown here
Re^3: incrementing already existing file by roboticus (Chancellor) on Feb 28, 2011 at 02:57 UTC
wanttoprogram: OK, are both files sorted with respect to the key fields? If so, then you don't really want nested loops. You want a single loop and you can decide which file to read depending on what the current condition is. Something like: # "Prime the pump" my $rec1 = <FILE1>; my $rec2 = <FILE2>; # Keep looping as long as either file has records while (!eof(FILE1) or !eof(FILE2)) { # Figure out what keys you have my $key1 = get_key_1($rec1); my $key2 = get_key_2($rec2); if ($key1 eq $key2) { # They're the same, so create an output record, and read # next record from file2 print build_record($rec1, $rec2); $rec2 = <FILE2>; } elsif ($key1 lt $key2) { # First file has a key we don't need, just ignore # it and read the next record $rec1 = <FILE1>; } else { # Hmmm ... first file seemed to skip the key we need. # print a partial record and advance to next file2 record print partial_record($rec2); $rec2 = <FILE2>; } } [download] Of course, if either of the files aren't sorted on the keys, then that won't work. You'll either have to sort them, or try something like a hash table. For the hash table, you simply read the first file into a hash based on the key field(s). Then you scan through the second file, looking up values from the hash as you need them. Something like: # Read dictionary my %abbreviations; while (my $line = <DATA>) { my ($abbrev,$longname) = split/:/, $line; $abbreviations{$abbrev}=$longname; } # Process file open my $FH, '<', 'the_file' or die; while (my $line = <$FH>) { my ($field1, $field2, $key, $field3) = split /\t/, $line; if (exists $abbreviations{$key}) { # key was abbreviated, replace with full value $key = $abbreviations{$key}; } print "$key: ($field1, $field2, $field3)\n"; } close $FH; __DATA__ perl:pathologically eclectic rubbish lister lisp:lots of irritating silly parenthesis python:all your space are belong to us ruby:a quack language [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]


"be consistent"
	PerlMonks