Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Re: Favourite modules April 2003

by cees (Curate)
on Apr 15, 2003 at 17:54 UTC ( #250630=note: print w/replies, xml ) Need Help??

in reply to Re: Favourite modules April 2003
in thread Favourite modules April 2003

Just a comment/question on what you mentioned here:

  • Memoize, especially to memoize the output of the following one,
  • Digest::MD5, to sort through heaps of archived data and figure out what was stored twice under different names

I'm just wondering what benefit Memoize provides in this context. It seems from your brief description that you are doing MD5s of the archived data so that you don't have to keep all that data in memory (just an MD5 hash of the data). This will make it easy to find duplicates and won't take up much memory. But my memoizing it, you are still keeping all the archived data in memory, and you are keeping the MD5 hash in memory as well. You might as well just store the data itself in a hash and do a straight comparison on it saving the time required to do an MD5 hash on it.

I'm curious to know if I am blatantly missing something here, or missunderstanding the usefulness of Memoize in this context.

By the way, I think Memoize is a great module, but I don't think there are many situations where it is actually beneficial.

Replies are listed 'Best First'.
Re: Re: Re: Favourite modules April 2003
by mirod (Canon) on Apr 15, 2003 at 18:53 UTC

    The function I memoize gets passed a file name, slurps the file, normalizes spaces and computes its MD5. So I don't think the content of the file is cached, as it is internal to the function.

    I agree that using Memoize only saves me the cost of a hash (filename => MD5). I just like how easy it is to use it, and how it removes some extra code. As programers we are used to adding extra data structures and code to cache that kind of result, but really, using Memoize gets us closer to the initial algorithm for solving the problem. At least that's how I justify using it here ;--)

      That makes more sense. Memoize will only cache the arguements to the function, and the return value so you are fine with your implementation. Sorry for jumping on this, but I thought that you might be doing something like the following:

      use Digest::MD5 qw(md5); use Memoize; memoize('md5');

      This would use gobs of memory (depending on the input) and wouldn't really accomplish anything useful.

      Do I get an award for coming up with the most unproductive use of Memoize???

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://250630]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2021-05-11 05:43 GMT
Find Nodes?
    Voting Booth?
    Perl 7 will be out ...

    Results (113 votes). Check out past polls.