Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: How best to fill missing values in a sparse matrix?

by ELISHEVA (Prior)
on Mar 22, 2011 at 07:18 UTC ( #894717=note: print w/replies, xml ) Need Help??

in reply to How best to fill missing values in a sparse matrix?

I'm a bit confused about your logic. You indicate that you want to look ahead for a good value, but you also seem to be relying on that first good row for missing values as well. (as indicated by your sample output and your mention of the fact that the first line is guaranteed to be good ). Also you say that the first line is guaranteed to be filled, but your sample input has out of range values. Does filled mean has a value (even out of range) or does filled mean "have an in-range value"? What determines when you look ahead and when you look behind?

Also how many lines do you typically need to look ahead before you find a complete set of good values? And do you even need a complete set, or do you just need in-range values for the columns explicitly specified in the current line? Put another way, will your output file have 50 values for each row? Or will it only have 2 values if the input row also had two values?

Assuming you always want to look forward for the next best value, why not use the *nix command tac to put the file in reverse order (last line first)?

This would allow a much simpler look-behind After you read enough lines to fill up your default array with good values for each column, your script would never need to hold more than two lines in memory at a time (the composite good value array and the current line), no matter how sparse the data nor how far away the next good value is.

Of course, if you are processing a real-time feed or do not have access to tac or an equivalent, then maybe you have no choice but to do a look ahead since you never really know what the "next" good value is until it happens, but if you don't really need to do it, there is no sin in taking the easy way out and using the tools at hand.

Update: realized that there are lots of unanswered questions here and added some. Also withdrew suggestion of using tac - either reading forward or backwards there are missing values, one will still need a multi-line look ahead to construct a full set of good values.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://894717]
[choroba]: if you turn off autocommit and run a prepared statement which fails, the transaction is automatically rolled back, at least in DBD::Pg
[erix]: at least, I'll know where to find the documentation :P
[choroba]: but if there's no prepared statement, there's no rollback
[choroba]: the rollback happens when deallocating the prepared statement in error state.
[choroba]: is this something that Pg enforces, or just a consequence of the Perl implementation?
[choroba]: also, does it make any sense? We run different statements generated from input structures, sometimes prepared statements are involved, sometimes not. We want the behaviour to be consistent.
[erix]: I'm not sure. I suppose you could compare behavior via DBD::Pg with behaviour of the naked SQL

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2017-09-19 13:00 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (220 votes). Check out past polls.