Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Efficient coupling/decoupling of serialized data

by bronto (Priest)
on Jun 24, 2002 at 08:12 UTC ( #176711=perlmeditation: print w/ replies, xml ) Need Help??

Here is another easy-to-solve problem that leads to more complex questions.

A standard DNS serial has the following syntax: YYYYMMDDVV, that is: the modification date in year-month-day format (four, two and two digits, respectively) and a two-digit version number. For example, the third version of today's map should be 2002062403.

Now suppose you have a script that updates your DNS maps: to update the serial your code should:

  • retrieve today's year, month and day and merge them together, say with a sprintf; let's put it in $today;
  • compare the serial $today, e.g.: with a pattern matching: if ($serial =~ /^$today/) {...;
  • if the match doesn't succeed, then setting $serial is really easy: $newserial = $today."01" ;;
  • if the match succeeds, it's easy again but a little more work is required: extract the last two chars of $serial and increment by one; no sweat: substr or a regexp are ok.

Solved this easy problem, one meditation came into my mind: having some data in this form: YYYYMMDDVV, what is the most efficient way to deserialize this data? I mean, among the many ways of doing the job:

  • using substr repeatedly, once for datum;
  • using a regexp like /^(\d{4})(\d{2})(\d{2})(\d{2})$/;
  • doing weird karussels with pack (maybe);
  • ...

which, in your opinion, could be the best way under several aspects (speed, memory consumption, cpu consumption, simplicity... choose your favourite :-)

Ciao!
--bronto

Comment on Efficient coupling/decoupling of serialized data
Select or Download Code
Re: Efficient coupling/decoupling of serialized data
by rob_au (Abbot) on Jun 24, 2002 at 09:13 UTC
    An interesting topic, bronto ... While myself, I would strongly lean towards the use of a regular expression or unpack statement for its simplicity in this task, I would add that the serial number of a DNS zone file does not necessarily have to follow this format - See RFC1035. The serial number of a DNS zone file is an unsigned 32-bit number which is wrapped for zone versioning - It does not have to take account of current date stamps as suggested, indeed, even the simple increment of a zone files' serial number would be sufficient for updates.

    I add this point as more often than not, it is administrative policy moreso than technical requirements which adds the complexity of administrative problems :-)

     

      I agree with you: the YYYYMMDDVV format is not a requirement; actually, it is a RIPE reccomendation. And I like it because it's both human readable and easy scriptable.

      Ciao!
      --bronto

      Update January 31, 2003: I found that the link above is now obsolete, this new link should be ok! --bronto

(MeowChow) Re: Efficient coupling/decoupling of serialized data
by MeowChow (Vicar) on Jun 24, 2002 at 09:20 UTC
    doing weird karussels with pack (maybe);

    You want unpack, which was made for exactly this sort of thing (in addition to scaring off newbie programmers):

    my ($year, $mon, $day, $ver) = unpack 'A4A2A2A2', $serialdate;
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print

      Yes, I thought unpack but wrote pack, I am sorry.

      Thanks for your meditation, anyway. I look forward to hear from other people, too!

      Thanks again!
      --bronto

Re: Efficient coupling/decoupling of serialized data
by samtregar (Abbot) on Jun 24, 2002 at 17:04 UTC
    Why deserialize it at all? Just treat it as a number. If the serial is <= the current YYYYMMDD01 then you know that at least a day has passed, so replace it. If not then ++ the serial. It's always faster not to parse at all!

    -sam

      ++ the serial
      Just as long as there are no more than 99 versions in a day.

      /prakash

      Update: Of course, your numbering scheme itself breaks if there are more than 99 versions, so my comment above is not really valid.

        Nothing terrible happens if there are, it just rolls to the next day. 2002010199 becomes 2002010200. The algorithm still works - it's just the humans that get nervous about having a serial number of 2002010200 on Jan 1, 2002.

        -sam

Re: Efficient coupling/decoupling of serialized data
by Albannach (Prior) on Jun 24, 2002 at 17:30 UTC
    In this case, I'd argue for optimising readability as you probably aren't updating your DNS maps thousands of times per second. Speed is not an issue here, so for readability I think MeowChow's version is the best if you really need to extract the components of the serial number. Despite the general under-use of unpack, what the line intends is pretty clear I think, though one might want to glance at the docs before modifying it. The intent of samtreagar's method is even more obvious to the reader. Of course if you were processing millions of lines of data tagged with a similar serial number, then it may be worth the effort to look at speed, but then samtreagar has still got the winner (though I'd be careful how many times I'd ++ the serial number - it's not a real number after all).

    --
    I'd like to be able to assign to an luser

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://176711]
Approved by davis
Front-paged by samtregar
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2014-10-22 02:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (112 votes), past polls