Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Trouble with Transliterate Function

by ccelt09 (Sexton)
on Aug 28, 2013 at 22:49 UTC ( #1051326=perlquestion: print w/ replies, xml ) Need Help??
ccelt09 has asked for the wisdom of the Perl Monks concerning the following question:

A few days ago I was very kindly helped with writing a program to replace certain values in a long string of characters, as seen here here. I've attempted to modify and to use the program with short test files as shown below to change al characters outside my intervals to 'N', but the values changed are not consistent with the program's goal. Can anyone see an error in the code?

INPUT FILE: 0000000000 ------------- INTERVALS: chrX 1 3 chrX 5 6 chrX 8 9 ------------ DESIRED OUTPUT FILE: 000N00N00N ----------------- OUTPUT FILE: N00NNNNNNN ---------------- CODE: #!/usr/bin/perl -w use strict; use warnings; my $population = "test"; open( INPUT, "</Users/logancurtis-whitchurch/Desktop/test.mask.txt") o +r die "can't open masked file\n"; open( OUT, ">/Users/logancurtis-whitchurch/Desktop/filtered.test.mask. +txt") or die "can't open output file\n"; my $mask_input = <INPUT>; close INPUT; my $filtered_sites = "/Users/logancurtis-whitchurch/Desktop/test.inter +val"; open (INTERVAL, "<$filtered_sites") or die "can't open $filtered_sites +"; my $lastEnd = 1; while ( <INTERVAL> ) { my (undef, $start, $end) = split '\s', $_; ## change everything from the end of the last range ## to the start of this range to 'N' substr( $mask_input, $lastEnd, $start ) =~ tr[\x00-\xff][N]; $lastEnd = $end; } close INTERVAL; ## change everything from the end of the last range to the end of stri +ng to 'N' substr( $mask_input, $lastEnd, length( $mask_input ) ) =~ tr[\x00-\xff +][N]; print OUT "$mask_input";

Comment on Trouble with Transliterate Function
Download Code
Re: Trouble with Transliterate Function
by choroba (Abbot) on Aug 28, 2013 at 23:12 UTC
    Seems like an off by one error. I tried to fix it:
    #!/usr/bin/perl use warnings; use strict; my $mask_input = '0000000000'; my $lastEnd = 0; while (<DATA>) { my (undef, $start, $end) = split '\s+'; my ($from, $length) = ($lastEnd, $start - $lastEnd); $length-- if $length > 0; substr($mask_input, $from, $length) =~ tr[\x00-\xff][N]; $lastEnd = $end; } substr($mask_input, $lastEnd) =~ tr[\x00-\xff][N]; print "# Desired: 000N00N00N\n"; print "# Obtained: $mask_input\n"; print $mask_input eq '000N00N00N' ? "OK" : "Wrong", "\n"; __DATA__ chrX 1 3 chrX 5 6 chrX 8 9

    Updated: fixed the initial value of $lastEnd. Thanks abualiga.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Initially I ++voted your optimization but then noticed that if the first interval range does not start with 1, the first character will not be replaced with N. I think $lastEnd=0 is the correct initial value, as in BrowserUK's solution.

Re: Trouble with Transliterate Function
by jwkrahn (Monsignor) on Aug 29, 2013 at 03:00 UTC
    substr( $mask_input, $lastEnd, $start ) =~ tr[\x00-\xff][N];

    You could write that tr expression more simply as:

    substr( $mask_input, $lastEnd, $start ) =~ tr//N/c;
Re: Trouble with Transliterate Function
by hdb (Prior) on Aug 29, 2013 at 08:34 UTC

    The main issue really was that you provided an end position to substr while it expects a length. This has already been corrected by choroba above. I would like to propose to directly construct the intervals for the Ns under the assumption that your input intervals are sorted and non-overlapping. So instead of looking at 1 => 3, 5 => 6, etc I propose to add a zero at the beginning and the length of the string at the end to get 0 => 1, 3 => 5, 6 => ... which give you directly the intervals to put Ns into.

    use warnings; use strict; my $mask_input = '0000000000'; my %intervals = ( 0, ( do { local $/; <DATA> } =~ /(\d+)/g ), 1+length + $mask_input ); while( my( $start, $end ) = each %intervals ) { substr( $mask_input, $start, $end-$start-1 ) =~ tr/N/N/c; } print "$mask_input\n"; __DATA__ chrX 1 3 chrX 5 6 chrX 8 9
Re: Trouble with Transliterate Function
by jwkrahn (Monsignor) on Aug 29, 2013 at 09:31 UTC
    my $mask_input = <INPUT>; ... my $lastEnd = 1; while ( <INTERVAL> ) { my (undef, $start, $end) = split '\s', $_; ## change everything from the end of the last range ## to the start of this range to 'N' substr( $mask_input, $lastEnd, $start ) =~ tr[\x00-\xff][N]; $lastEnd = $end; } ... print OUT "$mask_input";

    Another way to do that:

    chomp( my $mask_input = <INPUT> ); my $new_mask = 'N' x length $mask_input; ... while ( <INTERVAL> ) { my ( undef, $start, $end ) = split; --$start; substr $new_mask, $start, $end - $start, substr $mask_input, $star +t, $end - $start; } ... print OUT "$new_mask\n";
Re: Trouble with Transliterate Function
by QM (Vicar) on Aug 29, 2013 at 11:20 UTC
    substr( $mask_input, $lastEnd, $start ) =~ tr[\x00-\xff][N];

    Why use tr at all? Just use substr:

    # Assuming $lastEnd is initialized to 0 above my $length = $start - $lastEnd; if ($length) { my $replacement = 'N' x $length; substr( $mask_input, $lastEnd, $length, $replacement ); }

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Trouble with Transliterate Function
by jwkrahn (Monsignor) on Aug 29, 2013 at 21:28 UTC

    This appears to work correctly:

    my @ranges = [ 0 ]; while ( <INTERVAL> ) { my ( undef, $start, $end ) = split; push @{ $ranges[ -1 ] }, $start - 1; push @ranges, [ $end ]; } push @{ $ranges[ -1 ] }, length $mask; for my $range ( @ranges ) { my ( $start, $end ) = @$range; next if $start == $end; substr( $mask, $start, $end - $start ) =~ tr//N/c; } print "$mask\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1051326]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-11-25 23:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (160 votes), past polls