Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Net::Clacks Important data corruption bugfix released

by cavac (Parson)
on Jul 07, 2021 at 07:39 UTC ( [id://11134749]=perlnews: print w/replies, xml ) Need Help??

I usually try to avoid clobbering the PM News section with release information about my modules. But i found a problem in my Net::Clacks::Server which can, under the wrong circumstances, either prevent startup or lead to all sorts of data corruption in cached data.

I have fixed this in Version 18 (at least i think it is completely fixed, bug reports are welcome).

Unfortunately, this upgrade features an incompatible change that clobbers the persistance file. There is a workaround for that to keep your cached data, though. From the UpgradeGuide.pod:

WARNING: BREAKING CHANGES regarding persistance files.

Version 18 provides better reliability for the Clacks Server when using a persistance file, including startup problems and prevention of data loss.

Here is the problem in older versions: If the server process is killed while writing the persistance file, the file on disk may be invalid (partial data) or completely blank. Blank files at least prevented the server from starting up, but otherwise incomplete files could lead to data corruption in cached data and/or all other sorts of problems. This relates to the fact that the server only does very minimal checks on its internal cache during runtime for performance reasons. The original persistance file was just a quick hack to dump/restore the cache between runs and did not account for system crashes or the process being killed while writing to disk.

This new version of Net::Clacks::Server tackles this in in a multi-stage approach:

1.) The persistance file now includes the ENDBYTE string as a third line. This is used to check if the file is complete.

2.) Before overwriting the current persistance file, it is COPIED to a file with '_bck' added to the configured name.

3.) The file is written under a new name (with '_' added at the end) and then MOVED over the correct file name. File move (rename) within the same directory should be an atomic operation of the operating system, so it either works or doesn't (no stopping half-way inbetween).

4.) An invalid persistance file doesn't prevent the clacks server startup. If it detects an invalid file, it first tries to load the previous version ('_bck'), then as a last desperate measure to prevent data loss, it tries to load the temporary file ('_'). If that also fails, it starts 'blankety-blank'.

Clacks caching was always designed as a CACHE, not as the final storage for important data. This upgrade to Version 18 changes the file format of the persistance file slighty. The older format is now detected as invalid, resulting in a "blankety-blank" startup. If you really want or need (?!?!?!) to preserve the cached data for some reason, there are two ways to accomplish this:

1.) Stop the clacks server and upgrade the Net::Clacks package. Edit the persistance file and add a third line with the word "ENDBYTES" without quotes as its only content.

or

2.) If you are running a master/slave setup with interclacks, you can stop ONLY the server that has the persistance file. Then upgrade Net::Clacks and start the server. It should start "blankety-blank" and automatically resyncronize cached data from the interclacks network. It works the same if all your clacks servers use persistance files for some weird reason, just restart them one-by-one, waiting inbetween for the interclacks sync to finish. You can use the example rawclient.pl. Connect to the server you keep running and start the MONITOR command. While interclacks sync is in progress, you will see KEYSYNC commands flying between the servers. If you stop seeing them for a few seconds, KEYSYNC is finished.

perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'

Replies are listed 'Best First'.
Re: Net::Clacks Important data corruption bugfix released
by stevieb (Canon) on Jul 08, 2021 at 07:54 UTC

    Three suggestions:

    • Write tests. You have no functional tests at all for anything, let alone regression tests for the issue you've fixed here
    • Break up your subs. A subroutine should do one thing, and do it well. When it gets longer than a short scroll, it's time to think about breaking it up into smaller pieces of functionality. The run() method is almost 1,000 lines. That's almost 1,000 lines too many. IMHO, a run() method should be concise in what it does. Its flow should be easy to follow and troubleshoot. Organize all of the direct functionality it does into logical code units (subs) with descriptive names. Dispatch tables can help you seriously clean up the if/else stuff going on
    • Write tests. Seriously. You've got a whole lot of conditional code in the module, and without tests, I can see a whole lot of problems arising out of even a simple change to the logic going on. Without tests, you might break 10 things without noticing before you even release the next version. Then to fix them without tests, you're bound to break another 10 unnoticed things

      Write tests. You have no functional tests at all for anything, let alone regression tests for the issue you've fixed here

      I find automated tests of limited use, and a "regression test" wouldn't have caught that specific issue since it's a timing issue/race condition. Instead, i run a couple of non-critical (private) servers that use Net::Clacks under real life conditions and exercise every part of the protocol. I usually upgrade those servers a few days before a new release to see if everything works as it should.

      I'm planning to clean up the code (up to a certain point). But the main function will most likely stay over 500 lines long, simply because it's a state machine and i like to be able to follow all the critical parts of the state flow without jumping in and out of subroutines in when reviewing the code. It's a personal preference thing.

      perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
Re: Net::Clacks Important data corruption bugfix released
by jdporter (Paladin) on Jul 07, 2021 at 12:16 UTC

    I think something's broken in your POD. Net::Clacks is (now) a broken link*; and Net-Clacks (now) lists two - different - Net::Clacks::Servers. I believe the first of those is supposed to be Net::Clacks.

    * It's only "broken" within the context of Net-Clacks-18. Search finds it in Net-Clacks-10.

    I reckon we are the only monastery ever to have a dungeon staffed with 16,000 zombies.

      I think something's broken in your POD.

      I was working on that when the issue cropped up. It will be in the next release. But thanks for the heads up.

      perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlnews [id://11134749]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-24 19:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found