Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Remove array if data inconsistent

by jhourcle (Prior)
on Apr 22, 2009 at 14:59 UTC ( #759308=note: print w/replies, xml ) Need Help??

in reply to Remove array if data inconsistent

It's not a direct answer to your question, which I think people have already commented on -- but once that you've pared down the list, you might want to consider making some sort of visualization of the line, so that you can quickly scan it visually and see what might be worth investigating further.

For the data that you're dealing with, I'd probably look at using sparklines -- there's a few CPAN modules to generate them.

Then, you can look at a page of graphs, and see which ones are stable / going up / random / etc.

Replies are listed 'Best First'.
Re^2: Remove array if data inconsistent
by BJ_Covert_Action (Beadle) on Apr 22, 2009 at 16:16 UTC
    Along the same lines, you could always simply sort the different lines into two different output files if you are worried about record keeping. For instance, once you decide how you want to filter/screen the data (probably by one of the methods discussed in the previous posts) you could put a simple if-else check to send the 'consistent' data lines to a file named "good_data.csv" and the 'inconsistent' data lines to a file named "discarded_data.csv" (or whichever names you like) so that you can review the discarded data at a later time if need be, or recover any lines that were filtered incorrectly. This also might help you find any errors or bugs in you code later on as you test it more thoroughly.

    I am thinking something along these lines (pseudocode) (in case you aren't terribly familiar with syntax):

    open INPUT, "name_of_input_file" or die: $!; open GOOD_OUTPUT, ">good_data.csv" or die $!; open BAD_OUTPUT, ">discarded_data.csv" or die $!; while(my $line = <INPUT>){ chomp $line; if(data_meets_good_condition){ print GOOD_OUTPUT "$line\n"; }else{ print BAD_OUTPUT "$line\n"; } }

    That's a little rudimentary (and verbiose if you are a fan of golf) and needs an appropriate logical check in the if condition, but the primary idea is the if-else check, that way you don't totally wipe data by accident. Again, in terms of filtering the data, some of the methods discussed above are probably better.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://759308]
[PriNet]: go figure, i just tried that, it retains some of the values (not the key) of the pre-assigned value
[PriNet]: i guess i'll just have to use two seperate hashes, there are two, but one is larger than the other, and when i reference the smaller one, it holds onto the values that were assigned to the larger hash (the keynames change, but unless i reassign a new valu

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2017-06-28 02:56 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (619 votes). Check out past polls.