Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re: How best to fill missing values in a sparse matrix?

by ELISHEVA (Prior)
on Mar 22, 2011 at 07:18 UTC ( #894717=note: print w/replies, xml ) Need Help??

in reply to How best to fill missing values in a sparse matrix?

I'm a bit confused about your logic. You indicate that you want to look ahead for a good value, but you also seem to be relying on that first good row for missing values as well. (as indicated by your sample output and your mention of the fact that the first line is guaranteed to be good ). Also you say that the first line is guaranteed to be filled, but your sample input has out of range values. Does filled mean has a value (even out of range) or does filled mean "have an in-range value"? What determines when you look ahead and when you look behind?

Also how many lines do you typically need to look ahead before you find a complete set of good values? And do you even need a complete set, or do you just need in-range values for the columns explicitly specified in the current line? Put another way, will your output file have 50 values for each row? Or will it only have 2 values if the input row also had two values?

Assuming you always want to look forward for the next best value, why not use the *nix command tac to put the file in reverse order (last line first)?

This would allow a much simpler look-behind After you read enough lines to fill up your default array with good values for each column, your script would never need to hold more than two lines in memory at a time (the composite good value array and the current line), no matter how sparse the data nor how far away the next good value is.

Of course, if you are processing a real-time feed or do not have access to tac or an equivalent, then maybe you have no choice but to do a look ahead since you never really know what the "next" good value is until it happens, but if you don't really need to do it, there is no sin in taking the easy way out and using the tools at hand.

Update: realized that there are lots of unanswered questions here and added some. Also withdrew suggestion of using tac - either reading forward or backwards there are missing values, one will still need a multi-line look ahead to construct a full set of good values.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://894717]
[ambrus]: ething that's both cleanly organized and mostly automated.
LanX in train, bad connection
[Corion]: ambrus: Yeah - we're in that situation too, except that there is no time to do the reorganizing :-/
[LanX]: ... so my boss started a project with the newest sun servers and invited the traders to come on weekend to test it... and they were so pleased, that they forced him to keep it in production...
[ambrus]: Corion: sure, this is the long-term plan. The short term is that I have to run this ungodly mess to get results from the new input data today.
[Corion]: ambrus: Most of our "automation" is tied to process exit codes and a shell pipeline :-\
[LanX]: ... a week later they realized that one of the databases - which recorded how much the other banks due to this bank - was not correctly plugged
[ambrus]: Corion: I have no problem with exit codes and shell pipeline. My problem is that the current process requires a lot of manual intervention from me, including editing the source codes.
[ambrus]: (Also a lot of manual intervention by two or three other co-workers, who do other parts of the process.)
[ambrus]: Some of the manual part is unavoidable, but not all.

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (16)
As of 2017-03-29 11:49 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (350 votes). Check out past polls.