Efficient coupling/decoupling of serialized data

Here is another easy-to-solve problem that leads to more complex questions.

A standard DNS serial has the following syntax: YYYYMMDDVV, that is: the modification date in year-month-day format (four, two and two digits, respectively) and a two-digit version number. For example, the third version of today's map should be 2002062403.

Now suppose you have a script that updates your DNS maps: to update the serial your code should:

retrieve today's year, month and day and merge them together, say with a sprintf; let's put it in $today;
compare the serial $today, e.g.: with a pattern matching: if ($serial =~ /^$today/) {...;
if the match doesn't succeed, then setting $serial is really easy: $newserial = $today."01" ;;
if the match succeeds, it's easy again but a little more work is required: extract the last two chars of $serial and increment by one; no sweat: substr or a regexp are ok.

Solved this easy problem, one meditation came into my mind: having some data in this form: YYYYMMDDVV, what is the most efficient way to deserialize this data? I mean, among the many ways of doing the job:

using substr repeatedly, once for datum;
using a regexp like /^(\d{4})(\d{2})(\d{2})(\d{2})$/;
doing weird karussels with pack (maybe);
...

which, in your opinion, could be the best way under several aspects (speed, memory consumption, cpu consumption, simplicity... choose your favourite :-)

Ciao!
--bronto

Comment on Efficient coupling/decoupling of serialized data Select or Download Code

Replies are listed 'Best First'.
(MeowChow) Re: Efficient coupling/decoupling of serialized data by MeowChow (Vicar) on Jun 24, 2002 at 09:20 UTC
doing weird karussels with pack (maybe); You want unpack, which was made for exactly this sort of thing (in addition to scaring off newbie programmers): `my ($year, $mon, $day, $ver) = unpack 'A4A2A2A2', $serialdate;` [download] MeowChow s aamecha.s a..a\u$&owag.print	[reply] [d/l]
Re: (MeowChow) Re: Efficient coupling/decoupling of serialized data by bronto (Priest) on Jun 24, 2002 at 10:31 UTC
Yes, I thought `unpack` but wrote `pack`, I am sorry. Thanks for your meditation, anyway. I look forward to hear from other people, too! Thanks again! `--bronto`	[reply] [d/l] [select]
Re: Efficient coupling/decoupling of serialized data by rob_au (Abbot) on Jun 24, 2002 at 09:13 UTC
An interesting topic, bronto ... While myself, I would strongly lean towards the use of a regular expression or unpack statement for its simplicity in this task, I would add that the serial number of a DNS zone file does not necessarily have to follow this format - See RFC1035. The serial number of a DNS zone file is an unsigned 32-bit number which is wrapped for zone versioning - It does not have to take account of current date stamps as suggested, indeed, even the simple increment of a zone files' serial number would be sufficient for updates. I add this point as more often than not, it is administrative policy moreso than technical requirements which adds the complexity of administrative problems :-)	[reply]
Re: Re: Efficient coupling/decoupling of serialized data by bronto (Priest) on Jun 24, 2002 at 10:08 UTC
I agree with you: the YYYYMMDDVV format is not a requirement; actually, it is a RIPE reccomendation. And I like it because it's both human readable and easy scriptable. Ciao! `--bronto` Update January 31, 2003: I found that the link above is now obsolete, this new link should be ok! --bronto	[reply]
Re: Efficient coupling/decoupling of serialized data by samtregar (Abbot) on Jun 24, 2002 at 17:04 UTC
Why deserialize it at all? Just treat it as a number. If the serial is <= the current YYYYMMDD01 then you know that at least a day has passed, so replace it. If not then ++ the serial. It's always faster not to parse at all! -sam	[reply]
Re: Re: Efficient coupling/decoupling of serialized data by PrakashK (Pilgrim) on Jun 24, 2002 at 22:02 UTC
++ the serial Just as long as there are no more than 99 versions in a day. /prakash Update: Of course, your numbering scheme itself breaks if there are more than 99 versions, so my comment above is not really valid.	[reply]
Re: Re: Re: Efficient coupling/decoupling of serialized data by samtregar (Abbot) on Jun 24, 2002 at 23:22 UTC
Nothing terrible happens if there are, it just rolls to the next day. 2002010199 becomes 2002010200. The algorithm still works - it's just the humans that get nervous about having a serial number of 2002010200 on Jan 1, 2002. -sam	[reply]
Re: Efficient coupling/decoupling of serialized data by Albannach (Monsignor) on Jun 24, 2002 at 17:30 UTC
In this case, I'd argue for optimising readability as you probably aren't updating your DNS maps thousands of times per second. Speed is not an issue here, so for readability I think MeowChow's version is the best if you really need to extract the components of the serial number. Despite the general under-use of unpack, what the line intends is pretty clear I think, though one might want to glance at the docs before modifying it. The intent of samtreagar's method is even more obvious to the reader. Of course if you were processing millions of lines of data tagged with a similar serial number, then it may be worth the effort to look at speed, but then samtreagar has still got the winner (though I'd be careful how many times I'd `++` the serial number - it's not a real number after all). -- I'd like to be able to assign to an luser	[reply] [d/l]

Back to Meditations