Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Welcome to the Monastery
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

One thing to do is to construct a subroutine to do the processing logic. Your XML input file presumably isnt changing for the duration of the run, so theres no need to do a loop of the possibilities for each input record. Instead your loop over the xml file should happen once, manufacturing perl code that does the condition logic. This is then wrapped up in a subroutine and eval()ed into existance. The resulting sub will be considerably faster than the doing all of that processing over and over.

IMO you are encountering a common problem for perl programmers, using low level coding tactics inside of a scripting language. Consider that you are in essence writing an interpreter, and you arent doing it particularly efficiently. Not only that you are writing an interpreter in a language that is itself interpreted (although at an opcode level). So in order to bypass the problem of an interpreter running in an interpreter what you should be doing is exploiting the perl compiler/interpreter to do this for you.

IOW what you want to be writing a translator, that converts from your mini-language to perl. Then let perl rip. Also since the code you write will be writing code some of the normal style and good practice rules dont apply in the generated code, which allows you to do things like strip out all of the unnecessary code and do things very efficiently. For instance by unrolling the field processing loop you might end up with hundreds of statements almost the same but not quite, the sort of thing that one would advise against, but since its generated code who care? The code doing the generation doesnt look like that so its not really a problem.

Ive seen approaches like this cut per record processing times massively. Make the processing loop bare, unroll every loop and method call and subroutine inside of it that you can. IOW treat subroutines as macros and do the looping in the code generation if you possibly can. Reuse lexical vars when you know you can, precompute as much as you can outside of the subroutine and bind it in via perls closure mechanism. Even things like using hashes as records can be sacrificed. Instead in the generated code use fetchrow_array and put the results into lexicals and then where your logic normally would have been $rec->{field} replace it with $field to avoid the unnecessary hash lookup (ie, where you can preresolve a hash lookup at compile time do so). All of this stuff can make a really serious difference to how fast your code runs.

Update: A note about using a DB, while if you can possibly do your filtering in a DB then you should. But its worthwhile learning techniques for when using the DB is not possible or practical. For instance if the filtering rules were complex then it might mean a fetch per rule, which on a large table or one without relevent indexes could be quite problematic. Likewise the data source could be flat file, or some other medium where queries against prestablished indexes werent possible.

---
$world=~s/war/peace/g


In reply to Re: Fast seeking in a large array of hashes to generate a report. by demerphq
in thread Fast seeking in a large array of hashes to generate a report. by jbrugger

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (10)
    As of 2014-04-18 18:49 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      April first is:







      Results (471 votes), past polls