Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Tracking down memory leaks

by scain (Curate)
on Apr 13, 2005 at 13:10 UTC ( #447359=note: print w/ replies, xml ) Need Help??


in reply to Re: Tracking down memory leaks
in thread Tracking down memory leaks

I can't imagine code like that working in this application since the files can be quite large. Here is an outline of how the script works:

  • Prepare several DB SELECT statement handles that will be used inside the loop to get useful information.
  • Create tied hashes for caching that information so that I don't have to hit the database everytime I need the id of some frequenly used term.
  • Create an IO object that will parse the file line by line and hand back information about the line in an OO way.
  • Loop using the IO object's next_feature method. Do lots of bookkeeping using the tied hashes. Write output to several (about 10) files that will later be loaded into postgres using COPY FROM STDIN.
  • Close open files, destroy DB statement handles, and load data into database.

Scott
Project coordinator of the Generic Model Organism Database Project


Comment on Re^2: Tracking down memory leaks
Re^3: Tracking down memory leaks
by dragonchild (Archbishop) on Apr 13, 2005 at 13:46 UTC
    Create tied hashes for caching that information so that I don't have to hit the database everytime I need the id of some frequenly used term.

    And you're wondering why your memory usage is increasing? Why don't you try a run with caching disabled and see if that fixes your problem.

      That's why I tied them to DB_File--I can watch those files grow as the program runs, but there're only 4 of them and they only grow to only about 30MB each.

      Scott
      Project coordinator of the Generic Model Organism Database Project

Re^3: Tracking down memory leaks
by Anonymous Monk on Apr 13, 2005 at 14:48 UTC
    Create tied hashes for caching that information so that I don't have to hit the database everytime I need the id of some frequenly used term.
    Did you benchmark this? Repeatedly asking the database for the same thing might not be so bad if your database is good in caching. But tied hashes in Perl are slow. There are many factors involved, and what's best will vary from setup to setup, but don't dismiss something for tied hashes too easily if it's performance you care about.

    Of course, this has nothing to do with your memory problem.

      While it isn't related to the memory problem, we did think about this. The thing is, for each time through the loop, we have to hit the database about 10 times to obtain ids for a relatively small number of possible items. We are trying to eliminate the overhead of just hitting the database, not waiting for the query to finish, as it is very fast. It is that overhead that takes a while (comparatively). That overhead compared to a small BerkeleyDB database should favor BerkeleyDB.

      What I will probably do after I get these memory issues out of the way is offer as a command line option to either use in memory hashes or tied hashes, depending on the size of the file.

      Scott
      Project coordinator of the Generic Model Organism Database Project

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://447359]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2014-10-26 04:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (151 votes), past polls