http://www.perlmonks.org?node_id=890496


in reply to Re: incrementing already existing file
in thread incrementing already existing file

I am getting only one value repeatedly as shown below. I want it repeat only till column is same and then give a different #. It should look like the next output. The output I am getting is:
ATOM 1 N MET 1 4.440 25.987 -14.585 1.00 0.0 28.8 +216 ATOM 2 HT1 MET 1 4.524 25.860 -15.614 1.00 0.0 28. +8216 ATOM 3 HT2 MET 1 4.109 26.952 -14.383 1.00 0.0 28. +8216 ATOM 4 HT3 MET 1 3.729 25.316 -14.228 1.00 0.0 28. +8216 ATOM 5 CA MET 1 5.708 25.747 -13.831 1.00 0.0 28. +8216 ATOM 6 HA MET 1 6.008 24.725 -14.011 1.00 0.0 28. +8216 ATOM 7 CB MET 1 6.792 26.728 -14.367 1.00 0.0 28. +8216 ATOM 8 HB1 MET 1 6.812 26.659 -15.475 1.00 0.0 28. +8216 ATOM 9 HB2 MET 1 6.502 27.769 -14.109 1.00 0.0 28. +8216 ATOM 10 CG MET 1 8.241 26.517 -13.880 1.00 0.0 28. +8216 ATOM 11 HG1 MET 1 8.298 26.709 -12.787 1.00 0.0 28. +8216 ATOM 12 HG2 MET 1 8.547 25.462 -14.049 1.00 0.0 28. +8216 ATOM 13 SD MET 1 9.409 27.618 -14.738 1.00 0.0 28. +8216 ATOM 14 CE MET 1 10.824 27.291 -13.650 1.00 0.0 28. +8216 ATOM 15 HE1 MET 1 11.740 27.794 -14.026 1.00 0.0 28. +8216 ATOM 16 HE2 MET 1 10.631 27.662 -12.621 1.00 0.0 28. +8216 ATOM 17 HE3 MET 1 11.042 26.203 -13.587 1.00 0.0 28. +8216 ATOM 18 C MET 1 5.446 25.905 -12.332 1.00 0.0 28. +8216 ATOM 19 O MET 1 4.414 26.443 -11.925 1.00 0.0 28. +8216 ATOM 20 N ALA 2 6.330 25.384 -11.469 1.00 0.0 28. +8216 ATOM 21 HN ALA 2 7.105 24.825 -11.751 1.00 0.0 28. +8216 ATOM 22 CA ALA 2 6.383 25.717 -10.067 1.00 0.0 28. +8216 ATOM 23 HA ALA 2 6.344 26.791 -9.955 1.00 0.0 28. +8216 ATOM 24 CB ALA 2 5.300 25.034 -9.205 1.00 0.0 28. +8216 ATOM 25 HB1 ALA 2 4.288 25.319 -9.565 1.00 0.0 28. +8216 ATOM 26 HB2 ALA 2 5.394 23.928 -9.255 1.00 0.0 28. +8216 ATOM 27 HB3 ALA 2 5.396 25.346 -8.143 1.00 0.0 28. +8216 ATOM 28 C ALA 2 7.753 25.238 -9.659 1.00 0.0 28. +8216 ATOM 29 O ALA 2 8.299 24.357 -10.317 1.00 0.0 28. +8216 ATOM 30 N THR 3 8.353 25.813 -8.605 1.00 86.2 28. +8216 ATOM 31 HN THR 3 7.908 26.533 -8.079 1.00 0.0 28. +8216 ATOM 32 CA THR 3 9.687 25.408 -8.176 1.00 88.8 28. +8216 ATOM 33 HA THR 3 9.829 24.356 -8.373 1.00 0.0 28. +8216 ATOM 34 CB THR 3 10.847 26.194 -8.810 1.00 91.6 28. +8216 ATOM 35 HB THR 3 11.790 25.982 -8.261 1.00 0.0 28. +8216 ATOM 36 OG1 THR 3 10.614 27.598 -8.833 1.00 93.2 28. +8216 I should get output : ATOM 1 N MET 1 4.440 25.987 -14.585 1.00 0.0 28.8 +216 ATOM 2 HT1 MET 1 4.524 25.860 -15.614 1.00 0.0 28. +8216 ATOM 3 HT2 MET 1 4.109 26.952 -14.383 1.00 0.0 28. +8216 ATOM 4 HT3 MET 1 3.729 25.316 -14.228 1.00 0.0 28. +8216 ATOM 5 CA MET 1 5.708 25.747 -13.831 1.00 0.0 28. +8216 ATOM 6 HA MET 1 6.008 24.725 -14.011 1.00 0.0 28. +8216 ATOM 7 CB MET 1 6.792 26.728 -14.367 1.00 0.0 28. +8216 ATOM 8 HB1 MET 1 6.812 26.659 -15.475 1.00 0.0 28. +8216 ATOM 9 HB2 MET 1 6.502 27.769 -14.109 1.00 0.0 28. +8216 ATOM 10 CG MET 1 8.241 26.517 -13.880 1.00 0.0 28. +8216 ATOM 11 HG1 MET 1 8.298 26.709 -12.787 1.00 0.0 28. +8216 ATOM 12 HG2 MET 1 8.547 25.462 -14.049 1.00 0.0 28. +8216 ATOM 13 SD MET 1 9.409 27.618 -14.738 1.00 0.0 28. +8216 ATOM 14 CE MET 1 10.824 27.291 -13.650 1.00 0.0 28. +8216 ATOM 15 HE1 MET 1 11.740 27.794 -14.026 1.00 0.0 28. +8216 ATOM 16 HE2 MET 1 10.631 27.662 -12.621 1.00 0.0 28. +8216 ATOM 17 HE3 MET 1 11.042 26.203 -13.587 1.00 0.0 28. +8216 ATOM 18 C MET 1 5.446 25.905 -12.332 1.00 0.0 28. +8216 ATOM 19 O MET 1 4.414 26.443 -11.925 1.00 0.0 28. +8216 ATOM 20 N ALA 2 6.330 25.384 -11.469 1.00 0.0 24. +9274 ATOM 21 HN ALA 2 7.105 24.825 -11.751 1.00 0.0 24. +9274 ATOM 22 CA ALA 2 6.383 25.717 -10.067 1.00 0.0 24. +9274 ATOM 23 HA ALA 2 6.344 26.791 -9.955 1.00 0.0 24. +9274 ATOM 24 CB ALA 2 5.300 25.034 -9.205 1.00 0.0 24. +9274 ATOM 25 HB1 ALA 2 4.288 25.319 -9.565 1.00 0.0 24. +9274 ATOM 26 HB2 ALA 2 5.394 23.928 -9.255 1.00 0.0 24. +9274 ATOM 27 HB3 ALA 2 5.396 25.346 -8.143 1.00 0.0 24. +9274 ATOM 28 C ALA 2 7.753 25.238 -9.659 1.00 0.0 24. +9274 ATOM 29 O ALA 2 8.299 24.357 -10.317 1.00 0.0 24. +9274 ATOM 30 N THR 3 8.353 25.813 -8.605 1.00 86.2 19. +0884 ATOM 31 HN THR 3 7.908 26.533 -8.079 1.00 0.0 19. +0884 ATOM 32 CA THR 3 9.687 25.408 -8.176 1.00 88.8 19. +0884 ATOM 33 HA THR 3 9.829 24.356 -8.373 1.00 0.0 19. +0884 ATOM 34 CB THR 3 10.847 26.194 -8.810 1.00 91.6 19. +0884 ATOM 35 HB THR 3 11.790 25.982 -8.261 1.00 0.0 19. +0884 ATOM 36 OG1 THR 3 10.614 27.598 -8.833 1.00 93.2 19. +0884

Replies are listed 'Best First'.
Re^3: incrementing already existing file
by broomduster (Priest) on Feb 28, 2011 at 03:20 UTC
    It looks like both files use column 5 as a "key" of sorts to connect the two files. I would approach this by reading all of the first file (the one you open as MYFILE), collecting the values from the last column along the way. Since you only need to collect one value from each line, I save then in an array as I read the file. This will work fine even for fairly large files. When the first file is processed, read from the second file (the one you open as NEWF) and do the substitutions (line by line), writing the output as we go.
    #!/usr/bin/env perl use strict; use warnings; my $file1 = "pm-890461-in1.txt"; my $file2 = "pm-890461-in2.txt"; open( MYFILE, '<', $file1 ) or die "cannot open $file1: $!"; open( NEWF, '<', $file2 ) or die "cannot open $file2: $!"; my @in_values; while ( <MYFILE> ) { chomp; my( $index, $value ) = ( split /\s+/ )[4, -1]; # above line does same thing as next three # my @fields = ( split /\s+/ ); # my $index = $fields[4]; # my $value = $fields[-1]; $in_values[ $index ] = $value; } close MYFILE; while ( <NEWF> ) { chomp; my @fields = ( split /\s+/ ); my $index = $fields[4]; $fields[-1] = $in_values[ $index ]; my $output = join "\t", @fields; print "$output\n"; } close NEWF;
    Note that I use split (not substr) to get the fields of interest from each line (same approach for both files). For the output, I join the fields with a tab character. You should change that to something else (e.g., a fixed number of space characters) if you need the output formatted differently. And of course this writes to STDOUT, so you will need to redirect the output on the command line or add to this code to open an output file and print to that.

    When you are more comfortable with Perl, you will find that some of this is actually on the "verbose" side. Using Perl idioms would make some of my code more compact, but also a bit harder to follow until you have more experience.

      Thank you very much. The code worked perfectly well. But I have one last issue. There are two sets of values 'A' and 'B'. The code you gave me is considering B set values only. Is there any way I can ask it to look for A first and then move to B. Thank you again. It was very helpful.
        There are two sets of values 'A' and 'B'. The code you gave me is considering B set values only. Is there any way I can ask it to look for A first and then move to B.
        Almost certainly can. But you need to explain better what 'A' and 'B' are and show some examples of the input files. It sounds as if your data from NEWF have 'A' and 'B' somewhere on each line. 'A' should be replaced by a value from your MYFILE and 'B' should be replaced by a value from SOME_OTHER_FILE. This will be an easy modification if 'A' and 'B' are the last two columns in NEWF. If that is correct, here's how to proceed:

        • Read from SOME_OTHER_FILE exactly as I do from MYFILE, collecting the 'B' values in an array. This assumes that SOME_OTHER_FILE and MYFILE are formatted similarly.
        • Change the while ( <NEWF> ) loop to replace the last two fields with the values you collected from the other two files. HINT: you will use the indices -2 and -1 to access those elements.

        I think you should try to write this code yourself. If I made some wrong assumptions, post back with clarification and (most important) samples of the input files and a sample of what the output should look like. I'm quite happy to help you get your work done, but you will learn best if you give it a go on your own and then ask for help if something doesn't work the way you want.

Re^3: incrementing already existing file
by roboticus (Chancellor) on Feb 28, 2011 at 02:57 UTC

    wanttoprogram:

    OK, are both files sorted with respect to the key fields? If so, then you don't really want nested loops. You want a single loop and you can decide which file to read depending on what the current condition is. Something like:

    # "Prime the pump" my $rec1 = <FILE1>; my $rec2 = <FILE2>; # Keep looping as long as either file has records while (!eof(FILE1) or !eof(FILE2)) { # Figure out what keys you have my $key1 = get_key_1($rec1); my $key2 = get_key_2($rec2); if ($key1 eq $key2) { # They're the same, so create an output record, and read # next record from file2 print build_record($rec1, $rec2); $rec2 = <FILE2>; } elsif ($key1 lt $key2) { # First file has a key we don't need, just ignore # it and read the next record $rec1 = <FILE1>; } else { # Hmmm ... first file seemed to skip the key we need. # print a partial record and advance to next file2 record print partial_record($rec2); $rec2 = <FILE2>; } }

    Of course, if either of the files aren't sorted on the keys, then that won't work. You'll either have to sort them, or try something like a hash table. For the hash table, you simply read the first file into a hash based on the key field(s). Then you scan through the second file, looking up values from the hash as you need them. Something like:

    # Read dictionary my %abbreviations; while (my $line = <DATA>) { my ($abbrev,$longname) = split/:/, $line; $abbreviations{$abbrev}=$longname; } # Process file open my $FH, '<', 'the_file' or die; while (my $line = <$FH>) { my ($field1, $field2, $key, $field3) = split /\t/, $line; if (exists $abbreviations{$key}) { # key was abbreviated, replace with full value $key = $abbreviations{$key}; } print "$key: ($field1, $field2, $field3)\n"; } close $FH; __DATA__ perl:pathologically eclectic rubbish lister lisp:lots of irritating silly parenthesis python:all your space are belong to us ruby:a quack language

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.