Count similar characters in a row

$new_guy has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have been using a script that you helped develop:

http://perlmonks.org/index.pl?node_id=864169

The script has been brilliant. However, I am faced with a new task. The script has been counting all the rows that have all columns with exactly the same characters in my data. The data file is here:

http://perlmonks.org/index.pl?abspart=1;displaytype=displaycode;node_id=864179;part=4

This was done in a random fashion using the file:

http://perlmonks.orgindex.pl?abspart=1;displaytype=displaycode;node_id=864179;part=3

How do I modify the script to count the row only once once it has counted a previous row with the same integer/number in that appears infront of my row characters (that is the z's)?

The script I am using current is below. It takes the random file as: "random.txt" and the data file as "re-organized.txt"

jgg.pl

#!/usr/bin/perl
#
use strict;
use warnings;

use 5.010;

##you will print results but first remove any previous files

my $cgs = "cg_size.txt";
if (unlink($cgs) == 1) {
           print "Existing \"cg_size.txt\" file was removed\n";
           }
                    
#now make a file for the core genome sizes output
           my $output_cgs = "cg_size.txt";
           
           if (! open(CGS, ">>$output_cgs") ) {
                      print "Cannot open file \"$output_cgs\" to write
+ to!!\n\n";
                      exit;
                      }

my @tests = do {
   my $randomFile = q{random.txt};
   open my $randomFH, q{<}, $randomFile
      or die qq{open: < $randomFile: $!\n};
   map [ split ],
   <$randomFH>;
   };

my $columnFile = q{re-organized.txt};
open my $columnFH, q{<}, $columnFile
   or die qq{open: < $columnFile: $!\n};

my @results;
while ( <$columnFH> )
{
    my @cols = split;
    foreach my $idx ( 0 .. $#tests )
    {
    foreach my $subidx ( 0 .. $#{ $tests[ $idx ] } )
    {
        my @posns = split m{,}, $tests[ $idx ]->[ $subidx ];
        $results[ $idx ]->[ $subidx ] ++
           if scalar @posns == grep { q{z} eq $cols[ $_ ] } @posns;
    }
    }
}

close $columnFH
   or die qq{close: < $columnFile: $!\n};

say CGS qq{@$_} for @results;
[download]

Comment on Count similar characters in a row - only once Select or Download Code

Replies are listed 'Best First'.
Re: Count similar characters in a row - only once by Anonymous Monk on Jun 28, 2011 at 10:33 UTC
How do I modify the script to count the row only once once it has counted a previous row with the same integer/number in that appears infront of my row characters (that is the z's)? How do you think you should modify it? Now is the time to grab a pencil and paper, and draw a little diagram of steps your program would take to solve this problem	[reply]
Re^2: Count similar characters in a row - only once by $new_guy (Acolyte) on Jun 28, 2011 at 13:14 UTC
Hi Anonymous Monk, I have had quite a thought about it and I think the thing I would most likely do is to first reduce complexity by removing all duplicated lines with the same integer infront of them. Is this right. I tried running the script below and then ran the script above -jgg.pl. But the result are not good. Any ideas or suggestions how I could marry the two in a sensible fashion `my $file = 'my_data_file.txt'; my %seen = (); { local @ARGV = ($file); #local $^I = '2.txt'; while(<>){ $seen{$_}++; next if $seen{$_} > 1; print; } }` [download]	[reply] [d/l]
Re^3: Count similar characters in a row - only once by Anonymous Monk on Jun 28, 2011 at 14:06 UTC
Hi Anonymous Monk, I have had quite a thought about it and I think the thing I would most likely do is to first reduce complexity by removing all duplicated lines with the same integer infront of them. Is this right. Based on your descriptions it sounds right. I tried running the script below and then ran the script above -jgg.pl. But the result are not good. What is wrong with the results? Any ideas or suggestions how I could marry the two in a sensible fashion First get your code to do what you want, then think about marriage	[reply]
Re^4: Count similar characters in a row - only once by $new_guy (Acolyte) on Jun 29, 2011 at 10:47 UTC
Re^5: Count similar characters in a row - only once by Anonymous Monk on Jun 29, 2011 at 13:11 UTC

Back to Seekers of Perl Wisdom