oxydeepu has asked for the
wisdom of the Perl Monks concerning the following question:
Hi all perl monks,
I have a basic question. I have a file of numbers with two columns
##############
109026 3
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32
#############
like this. So i have to format them in such a way that. I should only consider those line which have a value greater than 4 in the second column. The output should be like this. I tried different things it is not working out.
#############
109027 109078 (which is 109028 + 50) 30
116958 117013 (which is 116963 + 50) 72
153087 153137 (which is 153087 + 50) 32
#############
I hope it explains the problem. I couldn't make it work.
This will be a great deal of help and this not an assignment. I am trying to learn perl by myself and it is a problem i just came across with.
Thank you in advance,
Deepak
Re: Sorting the numbers: A little tricky. by Anonymous Monk on Jan 25, 2013 at 09:28 UTC 
 [reply] 
Re: Sorting the numbers: A little tricky. by choroba (Canon) on Jan 25, 2013 at 09:53 UTC 
Can you explain the algorithm in greater detail? Why did you pick 116963 after 116958?
 [reply] 
Re: Sorting the numbers: A little tricky. by Rahul6990 (Beadle) on Jan 25, 2013 at 10:07 UTC 
Sorry But I didn't get your question.  [reply] 

We don't understand what you are trying to do. Picking up the number when it is highter than 4 is ok ... but, the output  please explain how you come to those results ? We don't see the math.
 [reply] 
Re: Sorting the numbers: A little tricky. by flexvault (Prior) on Jan 25, 2013 at 14:30 UTC 
oxydeepu,
Like everyone else, it is hard to understand the math. But your original question can be answered by reading the file one line at a time, do a 'chomp' and then 'split' the line into 2 strings. Then using a hash, populate it with keys and values where the value is greater than 4.
Now you can process your hash using a 'foreach' with a numeric 'sort' of the 'keys' of the hash, and use whatever Perl math code you want to get whatever results you want.
Like all things in Perl, this is one of many ways to do it, so experiment.
Good Luck
"Well done is better than well said."  Benjamin Franklin
 [reply] 
Re: Sorting the numbers: A little tricky. by thundergnat (Deacon) on Jan 25, 2013 at 14:40 UTC 
Your problem is poorly specified. When we have to make guesses as to how to derive your output from your input, it is likely that we will guess the simplest thing that could possibly work and do that, or, more likely, just ask for clarification. Since there have already been several of the latter, I'll take a shot at the former.
Using the following specs  for a file with 2 columns of numbers; let's call them pointer and value:
 look for consecutive runs where the value (in the second column) is:
 Greater than 4
 Greater than the value of the previous entries. *** < ASSUMPTION
 Print the pointer of the start of the run, the pointer + 50 of the end of the run, and the value of the end of the run
If it was me, I would do something like:
use warnings;
use strict;
my ($start, $lastp, $lastv);
while ( my $line = <DATA> ){
my ( $pointer, $value ) = split /\s+/, $line;
flush() if ( $value <= 4 or $value > 4 && $value < $lastv );
$start = $pointer unless ( defined $start  $value <= 4 );
( $lastp, $lastv ) = ( $pointer, $value );
}
flush();
sub flush {
if ( defined $start ){
printf "%d %d (which is %d + 50) %d\n", $start, $lastp + 50, $
+lastp,
$lastv;
}
undef $_ for ( $start, $lastp, $lastv );
}
__DATA__
109026 3
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32
Yields:
109027 109078 (which is 109028 + 50) 30
116958 117013 (which is 116963 + 50) 72
153087 153137 (which is 153087 + 50) 32
 [reply] [d/l] 
Re: Sorting the numbers: A little tricky. by oxydeepu (Novice) on Jan 25, 2013 at 15:13 UTC 
Thank you all for the comments.
So what I want is, only consider those lines which have a value greater than 4 in the 2 column.
#########
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32
########
in this set
109028  109027 is not greater 50
116958  109028 is greater than 50
so I take
109027 109078(109028 + 50)
so the next one will start from 116958
similiarly like above
153087  116963 > 50
so
116958 117013 (which is 116963 + 50)
Since 153087 did not have any neighbours.
153087 153137
i hope that will make a little bit more sense.
I am sorry guys for a vague explanation..
Thank you in advance,
Deepak
 [reply] 

 [reply] 
Re: Sorting the numbers: A little tricky. by choroba (Canon) on Jan 25, 2013 at 21:46 UTC 
I am still not sure I understand your specification. You should test this code with more data to see whether it behaves well in all the border cases:
#!/usr/bin/perl
use warnings;
use strict;
my ($first, $last, $last_small) = ('0e0', 0, 0);
while (<DATA>) {
my ($big, $small) = split;
next if 4 >= $small;
if ($big > $first + 50) {
show($first, $last, $last_small);
$first = $big;
}
$last = $big;
$last_small = $small;
}
show($first, $last, $last_small);
sub show {
my ($first, $last, $last_small) = @_;
print "$first ", $last + 50, " $last_small\n" unless '0e0' eq $fir
+st;
}
__DATA__
109026 3
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32
 [reply] [d/l] 
Re: Sorting the numbers: A little tricky. by oxydeepu (Novice) on Jan 28, 2013 at 09:39 UTC 
Hi all,
I will try and explain it the last time. So the file have two columns first is postion and second column is the cumulative frequencies within 50 numbers of the postions.
for ex,
109026 3
109027 25
109028 2
became
109026 3
109027 28
109028 30.
So what I have to do is iterate through the postions, which is column 1, get the postions which go increasing and till there is a difference between the current postion and (latter postion + 50) becomes > 50.
for example
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
153087 32
in the above set,
i will start with 109027, then 109028 and then 116958. Here 116958  (109028 + 50) is greater than 50
so the first line in the output will be
109027 109078 (which is 109028 + 50) 30
the 30 is the value of position 109028.
next step i have to start from 116958 go through 116963 till 153087, since the difference 153087  (116963 + 50 ) becomes > 50
So i will stop the iteration and output the next line, which is
116958 117013 (which is 116963 + 50) 72
where 72 is the value for 116963
then i will start from 153087, since there no increasing. I have to stop the iteration and out like this
153087 153137 (which is 153087 + 50) 32
This is the problem. I don't know whether i explained it better than last time. I don't have a code, i'm still stuck with how to implement. Hoping for help.
Thank you in advance.
regards,
Deepak
 [reply] 

As far as I understand, that is what my code here does.
 [reply] 

