Caching files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Caching files by choroba (Cardinal) on Jan 24, 2020 at 14:50 UTC
Tell us more. What OS are you on? On Linux, I've had good experience with `inotify` to watch a directory tree for changes. It doesn't detect changes made by `mmap`, though, so tell us also how the JSON files change. Other OSes use different notification tools. Checking (stat)`[9]` would be slower than running a notification tool, but it should still be faster than reading the file every time. This method might fail to invalidate the cache properly if `mmap` again was used to modify the files. Given the coordinates, do you know what JSON file holds the relevant information, or does this periodically change as well? If the latter, cache both the value and the filename. (But what would happen if there was conflicting information in two JSON files?) What kind of information do the JSON files provide? If it's a structure that JSON can represent, you can store the decoded structure in the hash directly. `$cache{$x}{$y}{$z} = $decoded_structure;` [download] BTW, File::Cache is now discouraged and recommends Cache::Cache which itself is not actively developed anymore and recommends CHI instead. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Caching files by Anonymous Monk on Jan 24, 2020 at 15:39 UTC
I'm on Linux and there is a cron process that writes at regular times the time window of validity of some forecast data for a number of geographical tiles. Each tile has a corresponding JSON written by the cron process where, for example, its time window is written. The data generated by the process are 4D arrays saved to ASCII files with regular structure, so that value(i,j,k,t) data are queried with direct access with seek(), upon calculation of byte start of value(i,j,k,t), through a function that has (i,j,k,t) as input. At the moment, the process consists in looking for the tile where a given point falls and then read the JSON of such tile when making the query. I wonder if there is a way to preload all the JSON files into a hash, and then update them when they change upon cron process execution. Here below are parts of code, so that maybe it is possible to understand the situation. In practice, I'd like to cache the sub _get_tile_info() get_data(); #------------- sub get_data { my %args = @_; my @coords = @{$args{Coords} \|\| []}; my @tiles_and_ids = _get_tile_and_ids(%args); foreach my $point (@tiles_and_ids) { my $data = _extract_ts(Point=>$point,WS2D=>1,WD2D=>1,TEMP2D=>1); } } #---------------------- sub _get_tile_and_ids { my %args = @_; my @coords = @{$args{Coords} \|\| []}; foreach my $pair (@coords_lonlat) my ($status,$tile,$ii,$jj) = _find_tile_and_ids(X=>$xx,Y=>$yy); push @results,$tile,$ii,$jj; } return \@results; } #----------------------- sub _find_tile_and_ids { my %args = @_; my $x = $args{X}; my $x = $args{Y}; # Find tile .... my ($icell,$jcell) = _find_cell(X=>$x,Y=>$y,Xmin=>$xll_tile,Ymin=>$yll_tile,Dxy=>$info{dxy}); return('',$tile,$icell,$jcell); } #---------------------- sub _find_cell my %args = @_; my $x = $args{X}; my $y = $args{Y}; my $xmin = $args{Xmin}; my $ymin = $args{Ymin}; my $dxy = $args{Dxy}; my $ii = floor(($x - $xmin) / $dxy) + 1; my $jj = floor(($y - $ymin) / $dxy) + 1; return ($ii,$jj); } #---------------- sub _extract_ts { my %args = @_; my $point = $args{Point} \|\| die; my ($tile,$ii,$jj) = ($point->1,$point->2,$point->3) my %file = ( WS2D => "$tile/ws3d.dat", TEMP2D => "$tile/temp3d.dat", ); my $tile_info = _get_tile_info(Tile=>$tile); .... } #------------------- sub _get_tile_info { my %args = @_; my $tile = $args{Tile}; my $json_file = "$tile/info.json"; my $tile_info = read_file($json_file); return $tile_info;	[reply]
Re^3: Caching files by choroba (Cardinal) on Jan 24, 2020 at 16:49 UTC
OK, we still miss some of the details, but let's have some fun. I created a Makefile like this: Read more... (781 Bytes) Now, you can run `make simulate_cron` [download] to generate the input data and start modifying them randomly. Then, run `make query` [download] in a different terminal. The Perl program is the following: #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Cpanel::JSON::XS qw{ decode_json }; my %cache; for (1 .. 1000) { my @queries = map [ map int 1 + rand 10, 1, 2 ], 1 .. 50; for my $query (@queries) { my ($x, $y) = @$query; # delete $cache{$x}{$y}; # <- Uncomment to simulate no cache. my $value; if (exists $cache{$x}{$y} && (stat "$x-$y.json")[9] == $cache{$x}{$y}{last} ) { $value = $cache{$x}{$y}{value}; } else { open my $in, '<', "$x-$y.json" or die $!; $cache{$x}{$y}{last} = (stat $in)[9]; $value = $cache{$x}{$y}{value} = decode_json(do { local $/ +; <$in> })->[2]; } say "$x, $y: $value"; } } [download] With the delete line uncommented, it takes about 0.400s to terminate. With the line commented, it runs under 0.100s, i.e. slightly more than 4 times faster. Notes: The simulation uses `mv` to create the JSON files so they change is atomic. If we wrote to the file directly instead, we could get occasional errors when reading it. We store the modification time before we read the value. There's a race condition: the value may change after we retrieved the modification time, but before we read the value. But it doesn't break the code: we return the correct value, but we might read it from the file once more next time. I guess the cron process doesn't change all the files all the time, so the real benefit of this kind of cache might be much lesser in your real environment. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^4: Caching files by Anonymous Monk on Jan 24, 2020 at 17:21 UTC


Pathologically Eclectic Rubbish Lister
	PerlMonks