Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Strategy for simple data management

by sundialsvc4 (Abbot)
on Jan 10, 2014 at 23:25 UTC ( [id://1070209]=note: print w/replies, xml ) Need Help??


in reply to Strategy for simple data management

Pragmatically speaking, you will have to approach tasks like this one in several very-distinct “layers” ...

  1. The first step is to get all of the data from any spreadsheet-file into a common data store ... e.g. an SQL database (SQLite file?).   Grab it exactly as-is, and arrange this data-intake script so that you are able to verify (from the database entries) that all of the available spreadsheets have in fact been imported ... when, by whom, and so on.   If you re-import a file that has already been previously imported, all of the preceding data should be cleanly replaced.   After all, the greatest threat to the data-integrity of the entire study is that data is missing, or that it is duplicated.
  2. The next step is standardization:   without altering the original “capture” data, this step converts apples to consistent oranges.   This process, once again, must be entirely reproducible.   It should create new, standardized data-tables from the data-capture originals.   If any of the input data does not conform to whatever validation rules you can come up with, it should be very-clearly flagged as non-conforming.
  3. The final step is ... whatever your analysis needs to be.   This step will rely very heavily upon all of the preceding steps to have delivered a data-set that is both complete and consistent, and/or to have clearly “blown the whistle” if something is wrong ... even if (especially if?) the source of the inconsistency is “the work of an experimenter.”   Always bear in mind that “only the computer itself” can be relied-upon to detect omissions or inconsistencies in a mass of collected data.   The scripts that comprise your pipeline must be not ony reliable but error-aware.

You can certainly use Perl for each of these steps.   (In a Windows environment, yes, Perl does OLE...)   Unfortunately, the exact nature of what needs to be built, and of how to correctly use what has been built, will be completely determined by what you need to do in this project.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1070209]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-25 19:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found