Your skill will accomplishwhat the force of many cannot PerlMonks

### Sorting the numbers: A little tricky.

by oxydeepu (Novice)
 on Jan 25, 2013 at 08:40 UTC Need Help??
oxydeepu has asked for the wisdom of the Perl Monks concerning the following question:

Hi all perl monks,

I have a basic question. I have a file of numbers with two columns

##############
109026 3
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32
#############

like this. So i have to format them in such a way that. I should only consider those line which have a value greater than 4 in the second column. The output should be like this. I tried different things it is not working out.

#############

109027 109078 (which is 109028 + 50) 30
116958 117013 (which is 116963 + 50) 72
153087 153137 (which is 153087 + 50) 32

#############

I hope it explains the problem. I couldn't make it work.
This will be a great deal of help and this not an assignment. I am trying to learn perl by myself and it is a problem i just came across with.

Deepak

Replies are listed 'Best First'.
Re: Sorting the numbers: A little tricky.
by Anonymous Monk on Jan 25, 2013 at 09:28 UTC

I tried different things it is not working out.

Re: Sorting the numbers: A little tricky.
by choroba (Bishop) on Jan 25, 2013 at 09:53 UTC
Can you explain the algorithm in greater detail? Why did you pick 116963 after 116958?
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Sorting the numbers: A little tricky.
by flexvault (Monsignor) on Jan 25, 2013 at 14:30 UTC

Like everyone else, it is hard to understand the math. But your original question can be answered by reading the file one line at a time, do a 'chomp' and then 'split' the line into 2 strings. Then using a hash, populate it with keys and values where the value is greater than 4.

Now you can process your hash using a 'foreach' with a numeric 'sort' of the 'keys' of the hash, and use whatever Perl math code you want to get whatever results you want.

Like all things in Perl, this is one of many ways to do it, so experiment.

Good Luck

"Well done is better than well said." - Benjamin Franklin

Re: Sorting the numbers: A little tricky.
by thundergnat (Deacon) on Jan 25, 2013 at 14:40 UTC

Your problem is poorly specified. When we have to make guesses as to how to derive your output from your input, it is likely that we will guess the simplest thing that could possibly work and do that, or, more likely, just ask for clarification. Since there have already been several of the latter, I'll take a shot at the former.

Using the following specs - for a file with 2 columns of numbers; let's call them pointer and value:

• look for consecutive runs where the value (in the second column) is:
1. Greater than 4
2. Greater than the value of the previous entries. *** <-- ASSUMPTION
• Print the pointer of the start of the run, the pointer + 50 of the end of the run, and the value of the end of the run

If it was me, I would do something like:

```use warnings;
use strict;

my (\$start, \$lastp, \$lastv);

while ( my \$line = <DATA> ){
my ( \$pointer, \$value ) = split /\s+/, \$line;
flush() if ( \$value <= 4 or \$value > 4 && \$value < \$lastv );
\$start = \$pointer unless ( defined \$start || \$value <= 4 );
( \$lastp, \$lastv ) = ( \$pointer, \$value );
}
flush();

sub flush {
if ( defined \$start ){
printf "%d %d (which is %d + 50) %d\n", \$start, \$lastp + 50, \$
+lastp,
\$lastv;
}
undef \$_ for ( \$start, \$lastp, \$lastv );
}

__DATA__
109026 3
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32

Yields:

```109027 109078 (which is 109028 + 50) 30
116958 117013 (which is 116963 + 50) 72
153087 153137 (which is 153087 + 50) 32
```
Re: Sorting the numbers: A little tricky.
by choroba (Bishop) on Jan 25, 2013 at 21:46 UTC
I am still not sure I understand your specification. You should test this code with more data to see whether it behaves well in all the border cases:
```#!/usr/bin/perl
use warnings;
use strict;

my (\$first, \$last, \$last_small) = ('0e0', 0, 0);
while (<DATA>) {
my (\$big, \$small) = split;
next if 4 >= \$small;
if (\$big > \$first + 50) {
show(\$first, \$last, \$last_small);
\$first = \$big;
}
\$last       = \$big;
\$last_small = \$small;
}
show(\$first, \$last, \$last_small);

sub show {
my (\$first, \$last, \$last_small) = @_;
print "\$first ", \$last + 50, " \$last_small\n" unless '0e0' eq \$fir
+st;
}

__DATA__
109026 3
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Sorting the numbers: A little tricky.
by Rahul6990 (Beadle) on Jan 25, 2013 at 10:07 UTC
Sorry But I didn't get your question.
We don't understand what you are trying to do. Picking up the number when it is highter than 4 is ok ... but, the output - please explain how you come to those results ? We don't see the math.
Re: Sorting the numbers: A little tricky.
by oxydeepu (Novice) on Jan 28, 2013 at 09:39 UTC

Hi all,
I will try and explain it the last time. So the file have two columns first is postion and second column is the cumulative frequencies within 50 numbers of the postions.

for ex,

109026 3
109027 25
109028 2

became

109026 3
109027 28
109028 30.

So what I have to do is iterate through the postions, which is column 1, get the postions which go increasing and till there is a difference between the current postion and (latter postion + 50) becomes > 50.

for example

109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
153087 32

in the above set,
i will start with 109027, then 109028 and then 116958. Here 116958 - (109028 + 50) is greater than 50
so the first line in the output will be

109027 109078 (which is 109028 + 50) 30

the 30 is the value of position 109028.

next step i have to start from 116958 go through 116963 till 153087, since the difference 153087 - (116963 + 50 ) becomes > 50
So i will stop the iteration and output the next line, which is

116958 117013 (which is 116963 + 50) 72

where 72 is the value for 116963

then i will start from 153087, since there no increasing. I have to stop the iteration and out like this

153087 153137 (which is 153087 + 50) 32

This is the problem. I don't know whether i explained it better than last time. I don't have a code, i'm still stuck with how to implement. Hoping for help.

regards,
Deepak

As far as I understand, that is what my code here does.
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Sorting the numbers: A little tricky.
by oxydeepu (Novice) on Jan 25, 2013 at 15:13 UTC

Thank you all for the comments.

So what I want is, only consider those lines which have a value greater than 4 in the 2 column.

#########
109027 28
109028 30
116958 15
116960 35
116961 39
116962 70
116963 72
147184 2
147588 1
153087 32
########
in this set
109028 - 109027 is not greater 50
116958 - 109028 is greater than 50

so I take
109027 109078(109028 + 50)
so the next one will start from 116958
similiarly like above
153087 - 116963 > 50
so
116958 117013 (which is 116963 + 50)
Since 153087 did not have any neighbours.
153087 153137

i hope that will make a little bit more sense.
I am sorry guys for a vague explanation..
Deepak

And if this problem is "A little tricky", you must have already gotten a start on it. Please show the code that you've started with so we can diagnose where you're going wrong, and how to correct it.

Dave

Create A New User
Node Status?
node history
Node Type: perlquestion [id://1015288]
Approved by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2018-03-23 04:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When I think of a mole I think of:

Results (288 votes). Check out past polls.

Notices?