Statistics::Covid : module for fetching and storing covid19-related data for analysisby bliako (Prior)
|on Mar 26, 2020 at 19:42 UTC||Need Help??|
I have just submitted to CPAN a very alpha release of a module which collects data from various online providers of Covid19-related statistics (e.g. number of confirmed cases etc.). For example, data provided by Johns Hopkins University (as an arcgis "dashboard") or the data provided by the UK government for data relating to the UK local authorities.
All the providers I used (so far, John Hopkins University and the UK government) offer an API which provides JSON data. The scraper can be easily configured (that is subclassed) to set the url entry point to the API and how data should be converted to a Perl object. So, it is relatively easy to create more data fetchers which can all store to the same db.
Fetched data is stored in an SQLite database (support for MySQL exists but remains untested and probably broken - but easily fixed) and there is a high-level interface (thank you DBIx::Class) for saving and retrieving this data. This makes it easy to save data points only if they are more "up-to-date" than what currently exists in database, for the same location and time point (using heuristics). Or, it allows to retrieve all data for a single location over time, or for a single time point/range over all or some locations.
The CPAN module is Statistics::Covid. It is also hosted on github at https://github.com/hadjiprocopis/statistics-covid which additionally provides the data I have so far collected since a couple of weeks ago.
If anyone has any comments or suggestions please leave me a message.
If anyone wishes to contribute, e.g. data analysis or plots generation, under this or any other namespace, please let me know so that I link to that work. I am also starting to write my own analysis which will be under the namespace: Statistics::Covid::Analysis.
Here is some code from the synopsis as a quick start: