Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

One thing to do is to construct a subroutine to do the processing logic. Your XML input file presumably isnt changing for the duration of the run, so theres no need to do a loop of the possibilities for each input record. Instead your loop over the xml file should happen once, manufacturing perl code that does the condition logic. This is then wrapped up in a subroutine and eval()ed into existance. The resulting sub will be considerably faster than the doing all of that processing over and over.

IMO you are encountering a common problem for perl programmers, using low level coding tactics inside of a scripting language. Consider that you are in essence writing an interpreter, and you arent doing it particularly efficiently. Not only that you are writing an interpreter in a language that is itself interpreted (although at an opcode level). So in order to bypass the problem of an interpreter running in an interpreter what you should be doing is exploiting the perl compiler/interpreter to do this for you.

IOW what you want to be writing a translator, that converts from your mini-language to perl. Then let perl rip. Also since the code you write will be writing code some of the normal style and good practice rules dont apply in the generated code, which allows you to do things like strip out all of the unnecessary code and do things very efficiently. For instance by unrolling the field processing loop you might end up with hundreds of statements almost the same but not quite, the sort of thing that one would advise against, but since its generated code who care? The code doing the generation doesnt look like that so its not really a problem.

Ive seen approaches like this cut per record processing times massively. Make the processing loop bare, unroll every loop and method call and subroutine inside of it that you can. IOW treat subroutines as macros and do the looping in the code generation if you possibly can. Reuse lexical vars when you know you can, precompute as much as you can outside of the subroutine and bind it in via perls closure mechanism. Even things like using hashes as records can be sacrificed. Instead in the generated code use fetchrow_array and put the results into lexicals and then where your logic normally would have been $rec->{field} replace it with $field to avoid the unnecessary hash lookup (ie, where you can preresolve a hash lookup at compile time do so). All of this stuff can make a really serious difference to how fast your code runs.

Update: A note about using a DB, while if you can possibly do your filtering in a DB then you should. But its worthwhile learning techniques for when using the DB is not possible or practical. For instance if the filtering rules were complex then it might mean a fetch per rule, which on a large table or one without relevent indexes could be quite problematic. Likewise the data source could be flat file, or some other medium where queries against prestablished indexes werent possible.


In reply to Re: Fast seeking in a large array of hashes to generate a report. by demerphq
in thread Fast seeking in a large array of hashes to generate a report. by jbrugger

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [Discipulus]: no erix, was ironic, I could have said Hartz4
    [Discipulus]: for sure they had to be in armaic, many reverse-eng errors where spot
    [Discipulus]: do you know the google like translation: a rich.. as a camel through the needle hole?
    [choroba]: yes, I've read about it and its probable original form

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (5)
    As of 2017-11-23 20:41 GMT
    Find Nodes?
      Voting Booth?
      In order to be able to say "I know Perl", you must have:

      Results (338 votes). Check out past polls.