Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Fast seeking in a large array of hashes to generate a report.

by demerphq (Chancellor)
on Jun 23, 2005 at 08:26 UTC ( #469315=note: print w/replies, xml ) Need Help??

in reply to Fast seeking in a large array of hashes to generate a report.

One thing to do is to construct a subroutine to do the processing logic. Your XML input file presumably isnt changing for the duration of the run, so theres no need to do a loop of the possibilities for each input record. Instead your loop over the xml file should happen once, manufacturing perl code that does the condition logic. This is then wrapped up in a subroutine and eval()ed into existance. The resulting sub will be considerably faster than the doing all of that processing over and over.

IMO you are encountering a common problem for perl programmers, using low level coding tactics inside of a scripting language. Consider that you are in essence writing an interpreter, and you arent doing it particularly efficiently. Not only that you are writing an interpreter in a language that is itself interpreted (although at an opcode level). So in order to bypass the problem of an interpreter running in an interpreter what you should be doing is exploiting the perl compiler/interpreter to do this for you.

IOW what you want to be writing a translator, that converts from your mini-language to perl. Then let perl rip. Also since the code you write will be writing code some of the normal style and good practice rules dont apply in the generated code, which allows you to do things like strip out all of the unnecessary code and do things very efficiently. For instance by unrolling the field processing loop you might end up with hundreds of statements almost the same but not quite, the sort of thing that one would advise against, but since its generated code who care? The code doing the generation doesnt look like that so its not really a problem.

Ive seen approaches like this cut per record processing times massively. Make the processing loop bare, unroll every loop and method call and subroutine inside of it that you can. IOW treat subroutines as macros and do the looping in the code generation if you possibly can. Reuse lexical vars when you know you can, precompute as much as you can outside of the subroutine and bind it in via perls closure mechanism. Even things like using hashes as records can be sacrificed. Instead in the generated code use fetchrow_array and put the results into lexicals and then where your logic normally would have been $rec->{field} replace it with $field to avoid the unnecessary hash lookup (ie, where you can preresolve a hash lookup at compile time do so). All of this stuff can make a really serious difference to how fast your code runs.

Update: A note about using a DB, while if you can possibly do your filtering in a DB then you should. But its worthwhile learning techniques for when using the DB is not possible or practical. For instance if the filtering rules were complex then it might mean a fetch per rule, which on a large table or one without relevent indexes could be quite problematic. Likewise the data source could be flat file, or some other medium where queries against prestablished indexes werent possible.


  • Comment on Re: Fast seeking in a large array of hashes to generate a report.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://469315]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2017-05-01 03:01 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (544 votes). Check out past polls.