Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Adjacent numbers

by melissa_randel (Initiate)
on Nov 19, 2015 at 21:43 UTC ( #1148176=perlquestion: print w/replies, xml ) Need Help??

melissa_randel has asked for the wisdom of the Perl Monks concerning the following question:

I am working with a document listing genome location numbers and want to select only those listings that have adjacent (greater or smaller) numerical values.

The dataset is formatted like this:

2L_33 2L_34 3L_45 3L_87 X_202 X_203 X_204

And I want this as an output (omits values with no numerically adjacent listings):

2L_33 2L_34 X_202 X_203 X_204

How can I write a script to accomplish this?

Replies are listed 'Best First'.
Re: Adjacent numbers
by choroba (Archbishop) on Nov 19, 2015 at 21:52 UTC
    Remember the previous line and the previous number. If they are adjacent, print the previous one, and remember to print the next one even if the next pair is not adjacent.
    #!/usr/bin/perl use warnings; use strict; my $previous_num = -1; my $previous_line; my $print_next = 0; sub output { my $num = shift; my $adjacent = abs($previous_num - $num) == 1; if ($print_next || $adjacent) { print $previous_line; $print_next = $adjacent; } } while (<>) { my ($num) = /([0-9]+$)/; output($num); $previous_num = $num; $previous_line = $_; } output($previous_num); # Process the last line.

    Update: Fixed to catch decreasing numbers, too. Thanks GotToBTru.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Not included in the sample data, but in the description:

      if ($previous_num + 1 == $num || $previous_num - 1 == $num) {
      Dum Spiro Spero

        It'll take a little more than that.

        It will need to remember the direction in which the numbers are running, otherwise 1,2,1,2,3,2,3,4,3... will be seen as a run.

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Adjacent numbers - the plain way
by Discipulus (Abbot) on Nov 20, 2015 at 10:41 UTC
    Hello melissa_randel and welcome to the monastery and to the wonderful world of Perl

    as a tip for your next posts i suggest to include some code you tried: you show more effort and the help can be better targeted at your level of wisdom: infact you had got good and very good replies to your question, but how many of them you understand completely?

    Me too I dont understand the smart Anonymous's almost oneliner: i would need to refill it with a lot of print statements before understanding it.

    Because of this i think the best approch is what the wise choroba presented you as first reply: think about your problem in words and then translate into Perl. I've started learning Perl with no programming nor scientific backgroud and after a decade of Perl i'm start thinking that the compiler is happier with plain basic code. Me too nowadays I tend to write 'smart' code but i think is often a matter of self exstimation more that a matter of quality.
    So the code I present you will be easy and commented for a full understanding.
    # always use stric and warnings (till the moment you know when is safe + disabling them) use strict; use warnings; # we use an array to grab DATA. array preserves order, if order in the + output is needed my @arr; # <> is something like an iterator: # $next_line = <DATA> retrieve next line # for <DATA> process all lines # we chomp all lines to remove \n at the end and then we push the @arr + with the line chomp $_ and push @arr,$_ for <DATA>; # hashes provides uniqueness of keys, and we need uniqueness because.. +. my %adj; # .. in the loop from 0 to the last index of @arr # (pay attention when using $#arr: @arr in scalar context return num o +f elements, # while $#arr is the last index of the array starting from 0 # so scalar @arr == $#arr + 1) # in the loop we process two value at time (sliding window?) checking +if the # numerical part is adjacent to the next element's numerical part for (0..$#arr){ # exit condition go EVERYTIME at the beginning of loops # so we will exit the loop if is the last element (yet processed p +reviously) last if $_ == $#arr; # grab the numerical part of interest # $1 is what inside the first matched () group. (capturing parenth +eses) my $cur_num = $1 if $arr[$_] =~/\d*[A-Z]_(\d+)$/; my $next_num = $1 if $arr[$_ + 1] =~/\d*[A-Z]_(\d+)$/; # if current is adjacent to next if ($cur_num == $next_num - 1){ # we populate the hash with nevermind values $adj{$arr[$_]} = undef; $adj{$arr[$_ + 1]} = undef; # if we had used $adj{$arr[$_]}++ (autoincrement) # you would notice the X_203 with value of 2 # because is inserted twice: as next_num while process +ing X_202 # and as current_num while X_204 } } # if the order of the data must be preserved we still have the array: # if the data was alphabetically ordered would be simpler (and the arr +ay unuseful) # simple as print "$_\n" for sor keys %adj foreach (@arr){ print "$_\n" if exists $adj{ $_ }; } __DATA__ 2L_33 2L_34 3L_45 3L_87 X_202 X_203 X_204
    Obviously concise code is a good thing. But someone here at PerlMonks once said:Dont code at your best. Being to debug twice difficult then write code, you'll not be able to debug, by definition
    so in the above code:
    my $cur_num = $1 if $arr[$_] =~/\d*[A-Z]_(\d+)$/; my $next_num = $1 if $arr[$_ + 1] =~/\d*[A-Z]_(\d+)$/;
    can be shortned (imagine a long list to process) into
    my ($cur_num,$next_num) = map {$1 if $_ =~/\d*[A-Z]_(\d+)$/} $arr +[$_],$arr[$_+1];
    But i suspect is not faster nor more efficient: is just more concise and uneasier to debug: the plain, kid version is the easiest to debug (because you'll get the exact line number of the statement producing the error!):
    if ( $arr[$_] =~/\d*[A-Z]_(\d+)$/ ){ $cur_num = $1; }

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      a nice one
Re: Adjacent numbers
by jeffa (Bishop) on Nov 20, 2015 at 00:47 UTC

    I had to take a stab at this. I am not proud of having to insert the final loop to filter out repeats, but maybe someone else can fix that. ;)

    This solution splits each data element by underscore and sorts by both the left and right, which hopefully will line up any stray adjacent elements and prevent having to look behind.

    use strict; use warnings; my %ordered; for (<DATA>) { chomp; my ($left,$right) = split /_/, $_, 2; push @{ $ordered{$left} }, $right; } my @matches; for my $key (sort keys %ordered) { my $last = 0; for (sort @{ $ordered{$key} } ) { push @matches, "${key}_$last","${key}_$_" if $_ - $last == 1; $last = $_; } } my %seen; for (@matches) { print "$_\n" unless $seen{$_}++; } __DATA__ X_203 2L_33 3L_45 X_202 2L_34 X_204 2L_32 3L_87 3L_88


    (the triplet paradiddle with high-hat)
Re: Adjacent numbers
by Lennotoecom (Pilgrim) on Nov 20, 2015 at 06:45 UTC
    /^(.*)_(\d+)$/ and undef $h{$1}{$2} for <DATA>; for(keys %h){ for $x (sort keys %{$h{$_}}){ if(exists $h{$_}{$x + 1} || exists $h{$_}{$x - 1}){ print "$_ : $x\n"; } } } __DATA__ 2L_33 2L_34 3L_45 3L_87 X_202 X_203 X_204
Re: Adjacent numbers
by hdb (Monsignor) on Nov 20, 2015 at 06:16 UTC

    If you have 2L_33 and X_34 in your dataset, do you want them listed?

Re: Adjacent numbers - plain way second
by Discipulus (Abbot) on Nov 20, 2015 at 11:29 UTC
    Ah!! it seems that nobody has noticed the wise hdb's advice!
    This is a very important thing while coding, Know your data! or the other face of the coin: Bad data ruins your day

    Compare the time and genius needed to modify all previous answer to NOT consider, let's say, AX_1 and  ZZ_2 as adjacent.

    With plain code add a feaure to the code is normally a trivial task: just grab two parts from regular expression and one control more in the if loop.
    use strict; use warnings; my @arr; chomp $_ and push @arr,$_ for <DATA>; my %adj; for (0..$#arr){ last if $_ == $#arr; my ($cur_num,$cur_code)= ($2,$1) if $arr[$_] =~/(\d*[A-Z])_(\d+)$/ +; my ($next_num,$next_code)= ($2,$1) if $arr[$_ + 1] =~/(\d*[A-Z])_( +\d+)$/; if (($cur_num == $next_num - 1) and ($cur_code eq $next_code) ){ $adj{$arr[$_]} = undef; $adj{$arr[$_ + 1]} = undef; } } foreach (@arr){ print "$_\n" if exists $adj{ $_ }; } __DATA__ AX_1 ZZ_2 2L_33 2L_34 3L_45 3L_87 X_202 X_203 X_204

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Thank you, Disciplus! I appreciated your explanation of the code, as I am brand new to perl, and it works wonderfully.
Re: Adjacent numbers
by Anonymous Monk on Nov 20, 2015 at 07:45 UTC

    It's almost a one-liner :)

    #!/usr/bin/perl # use strict; use warnings; my %used; print grep /_/ && !$used{$_}++, (join '', <DATA>) =~ /^(.*?(\d+)\n)(?=(.*?(\d+)\n))(??{1 != abs($2 - $4)&&'(*F)'})/gm; __DATA__ 2L_33 2L_34 3L_45 3L_87 X_202 X_203 X_204
Re: Adjacent numbers
by Anonymous Monk on Nov 19, 2015 at 22:37 UTC

    A crucial bit of information is missing. The solution will depend on whether the locations ("listings") appear in sorted order in your list ("dataset").

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1148176]
Approved by talexb
Front-paged by Discipulus
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2022-05-25 07:10 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (84 votes). Check out past polls.