Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Remove array if data inconsistent

by jhourcle (Prior)
on Apr 22, 2009 at 14:59 UTC ( #759308=note: print w/replies, xml ) Need Help??

in reply to Remove array if data inconsistent

It's not a direct answer to your question, which I think people have already commented on -- but once that you've pared down the list, you might want to consider making some sort of visualization of the line, so that you can quickly scan it visually and see what might be worth investigating further.

For the data that you're dealing with, I'd probably look at using sparklines -- there's a few CPAN modules to generate them.

Then, you can look at a page of graphs, and see which ones are stable / going up / random / etc.

Replies are listed 'Best First'.
Re^2: Remove array if data inconsistent
by BJ_Covert_Action (Beadle) on Apr 22, 2009 at 16:16 UTC
    Along the same lines, you could always simply sort the different lines into two different output files if you are worried about record keeping. For instance, once you decide how you want to filter/screen the data (probably by one of the methods discussed in the previous posts) you could put a simple if-else check to send the 'consistent' data lines to a file named "good_data.csv" and the 'inconsistent' data lines to a file named "discarded_data.csv" (or whichever names you like) so that you can review the discarded data at a later time if need be, or recover any lines that were filtered incorrectly. This also might help you find any errors or bugs in you code later on as you test it more thoroughly.

    I am thinking something along these lines (pseudocode) (in case you aren't terribly familiar with syntax):

    open INPUT, "name_of_input_file" or die: $!; open GOOD_OUTPUT, ">good_data.csv" or die $!; open BAD_OUTPUT, ">discarded_data.csv" or die $!; while(my $line = <INPUT>){ chomp $line; if(data_meets_good_condition){ print GOOD_OUTPUT "$line\n"; }else{ print BAD_OUTPUT "$line\n"; } }

    That's a little rudimentary (and verbiose if you are a fan of golf) and needs an appropriate logical check in the if condition, but the primary idea is the if-else check, that way you don't totally wipe data by accident. Again, in terms of filtering the data, some of the methods discussed above are probably better.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://759308]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2018-06-19 06:04 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.