http://www.perlmonks.org?node_id=911726

$new_guy has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have been using a script that you helped develop:

http://perlmonks.org/index.pl?node_id=864169

The script has been brilliant. However, I am faced with a new task. The script has been counting all the rows that have all columns with exactly the same characters in my data. The data file is here:

http://perlmonks.org/index.pl?abspart=1;displaytype=displaycode;node_id=864179;part=4

This was done in a random fashion using the file:

http://perlmonks.orgindex.pl?abspart=1;displaytype=displaycode;node_id=864179;part=3

How do I modify the script to count the row only once once it has counted a previous row with the same integer/number in that appears infront of my row characters (that is the z's)?

The script I am using current is below. It takes the random file as: "random.txt" and the data file as "re-organized.txt"

jgg.pl

#!/usr/bin/perl # use strict; use warnings; use 5.010; ##you will print results but first remove any previous files my $cgs = "cg_size.txt"; if (unlink($cgs) == 1) { print "Existing \"cg_size.txt\" file was removed\n"; } #now make a file for the core genome sizes output my $output_cgs = "cg_size.txt"; if (! open(CGS, ">>$output_cgs") ) { print "Cannot open file \"$output_cgs\" to write + to!!\n\n"; exit; } my @tests = do { my $randomFile = q{random.txt}; open my $randomFH, q{<}, $randomFile or die qq{open: < $randomFile: $!\n}; map [ split ], <$randomFH>; }; my $columnFile = q{re-organized.txt}; open my $columnFH, q{<}, $columnFile or die qq{open: < $columnFile: $!\n}; my @results; while ( <$columnFH> ) { my @cols = split; foreach my $idx ( 0 .. $#tests ) { foreach my $subidx ( 0 .. $#{ $tests[ $idx ] } ) { my @posns = split m{,}, $tests[ $idx ]->[ $subidx ]; $results[ $idx ]->[ $subidx ] ++ if scalar @posns == grep { q{z} eq $cols[ $_ ] } @posns; } } } close $columnFH or die qq{close: < $columnFile: $!\n}; say CGS qq{@$_} for @results;

Replies are listed 'Best First'.
Re: Count similar characters in a row - only once
by Anonymous Monk on Jun 28, 2011 at 10:33 UTC

    How do I modify the script to count the row only once once it has counted a previous row with the same integer/number in that appears infront of my row characters (that is the z's)?

    How do you think you should modify it?

    Now is the time to grab a pencil and paper, and draw a little diagram of steps your program would take to solve this problem

      Hi Anonymous Monk, I have had quite a thought about it and I think the thing I would most likely do is to first reduce complexity by removing all duplicated lines with the same integer infront of them. Is this right. I tried running the script below and then ran the script above -jgg.pl. But the result are not good. Any ideas or suggestions how I could marry the two in a sensible fashion

      my $file = 'my_data_file.txt'; my %seen = (); { local @ARGV = ($file); #local $^I = '2.txt'; while(<>){ $seen{$_}++; next if $seen{$_} > 1; print; } }

        Hi Anonymous Monk, I have had quite a thought about it and I think the thing I would most likely do is to first reduce complexity by removing all duplicated lines with the same integer infront of them. Is this right.

        Based on your descriptions it sounds right.

        I tried running the script below and then ran the script above -jgg.pl. But the result are not good.

        What is wrong with the results?

        Any ideas or suggestions how I could marry the two in a sensible fashion

        First get your code to do what you want, then think about marriage