Re: Caching Format
by moritz (Cardinal) on Jan 11, 2012 at 17:41 UTC
|
You could just keep the data in memory... or is there a reason not to?
Now on to the question. Any ideas on suggestions to make this quicker and easier?
Quicker to program or quicker to run?
If you want it quicker to run, you could just put the data on a RAM disc instead of a hard disc. If you want it easier to program, you could just use Perl data structures and JSON::XS or Storable for serializing and deserializing.
Currently the use of a DB is not an option.
And what is an option? How much control do you have over your environment? What about non-databases like memcached?
| [reply] [Watch: Dir/Any] |
|
Thanks everyone for the input and suggestions.
The environment I am working with is accredited (read "takes and act of god to add/remove software requirements such as SQL, Postgres", etc) and hardened. Additionally it is hosted on one of our customer's systems. So short answer concerning environment, I have very little control. The images that are being sent are being sent to this one vendor from multiple hands, all with a defined file format, so unless I decide to rename everything I need to store additional info. I should also mention I plan to write a web app to run reports on submitted/non-submitted/incomplete collections. Using memory might be an option, though I do not know how long I need to make sure this information is available. I do know that 10s of thousands come in daily, so I guess I will have to weigh in on File IO choking vs hogging resources.
hok
| [reply] [Watch: Dir/Any] |
Re: Caching Format
by Eliya (Vicar) on Jan 11, 2012 at 18:06 UTC
|
If I'm understanding you correctly, you could do away with the extra file (and the need for locking) by putting the info in the filenames themselves. For example
CollectionID_IDX_N_imgname.jpg
where CollectionID is a unique collection identifier, N is the expected total number of images, and IDX the image number within the collection (the collection ID could also be a directory). All you then have to do after having received a new image is a simple glob plus a check for completeness.
Update: forgot to mention that to avoid potential concurrency issues (reading yet incompletely written files), you'd rename a file to its final name only after you've finished writing it. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] [d/l] |
|
Hello RichardK, friends,
Sorry for answering via a shotgun message. First, again thanks for your time and help. After I posted my first reply I began playing around with http://search.cpan.org/~cleishman/Cache-2.04/ and fell in love with it. It was simple and handled the metadata caching format nicely. Collection entries could be pushed/popped as hash key=>value pairs. It also handled file locking and provided many methods to do all of the things I needed to do. Unfortunately I found out later from my boss that not only are Dbases not allowed, but any Perl Module that is not a Perl5 core module cannot be used either. Mulligan!
Regarding the heap vs files debate; I learned that the required level of persistence is actually quite high, certainly high enough to warrant the use of a Dbase if that was an option. Essentially collections will be kept indefinitely. That is the reason I chose to use files. I also found out for certain that I could not modify file names. As of now I plan on creating a pseudo-namespace for each collection by throwing collection metadata and files unique directories.
Cheers,
Hok
P.S. I used a lot of buzzwords and somehow left out "Cloud" so there I said it.
| [reply] [Watch: Dir/Any] |
Re: Caching Format
by RichardK (Parson) on Jan 11, 2012 at 18:03 UTC
|
| [reply] [Watch: Dir/Any] |
|
I second that motion. SQLite isn’t an SQL database server. What it is, is a public-domain(!) single file format which supports a rather full SQL-database model within that file (or files), including reliable transactions and therefore locking and known-good file sharing. You don’t have to install anything beyond a package. Since the database is “just a disk file,” in the same way that the file that you are now contemplating is just a disk file, this approach would give you tremendous “bang for your buck” and I would argue that no proprietary approvals of any kind would be necessary ... even in the most “hardened” business environment. You are storing the data “in a file,” as originally contemplated, but now that file happens to be an extremely smart file. Furthermore, the odds that SQLite is already there, even for run-of-the-mill purposes like hosting the cpan modules database, are pretty near 100%.
P.S.: Yes, I said public domain. There is no license.
| [reply] [Watch: Dir/Any] |
Re: Caching Format
by oko1 (Deacon) on Jan 12, 2012 at 04:06 UTC
|
Perl comes with a number of DB-like access methods; see the listing in AnyDBM_File. And you just might have YAML installed - a number of other modules call for it - which would make life just peachy (stores the data in a cleverly-arranged text file, essentially.) Write a simple script that prompts you for the above data and rolls it into YAML as a hash, then retrieve it whenever needed.
#!/usr/bin/perl
use warnings;
use strict;
use YAML 'DumpFile';
$|++;
# Cheating in a bit of data here...
my %images;
@images{1..5} = map "image$_.jpg", 1..5;
# Voila!
DumpFile("yaml.db", \%images);
--
Education is not the filling of a pail, but the lighting of a fire.
-- W. B. Yeats
| [reply] [Watch: Dir/Any] [d/l] |
Re: Caching Format
by FloydATC (Deacon) on Jan 13, 2012 at 13:59 UTC
|
If even SQLite is overkill, how about just using Storable? I've used this in a couple of places where I just wanted my script to "remember" a single hash between runs and didn't have to worry about concurrency etc. All it really does is save you the trouble of formatting/parsing the file.
use Storable;
store \%table, 'file';
$hashref = retrieve('file');
--
Time flies when you don't know what you're doing
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |