Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Algorithm To Select Lines Based On Attributes

by xdg (Monsignor)
on Jan 15, 2009 at 17:13 UTC ( #736593=note: print w/ replies, xml ) Need Help??


in reply to Algorithm To Select Lines Based On Attributes

Some quick thoughts:

  • #1: Use Devel::NYTProf or another profiler to see what the actual hot-spots in your code are!

  • Read lines from disk one at a time rather than slurping into @lines

  • Consider defining rules as subroutines acting on an argument and then use Memoize to cache results (assuming attributes re-occur frequently)

  • If you're re-running this against the same set of lines and rules frequently, cache the rule test results in a file or DB so you have DEFECTID and a list of rules it matches.

  • Perhaps reorganize the rules (if you can): $hash->{RULETYPE}->{RULENUMBER} = value. Then iterate the list of rules for each attribute, rather than (as you have it), iterating the attributes for each rule. I think that saves a lot of if ( defined $rulelist->{$rulenum}->{REGION} ) comparisons.

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.


Comment on Re: Algorithm To Select Lines Based On Attributes
Select or Download Code
Re^2: Algorithm To Select Lines Based On Attributes
by ~~David~~ (Hermit) on Jan 15, 2009 at 18:06 UTC
    Thanks for the suggestions. I have one question about bullet #2:
    I need to read to the end of the file before I enter this subroutine because I need to know how some information at the bottom of the file before I decide which rule set to use. I figured it would be better to cache that DEFECTLIST into memory rather than re-reading the file again. Is that best? Or, is there someway I could store the position in the file of the beginning and the end of the defect list, and always ensure that all characters between it are the DEFECTLIST? I don't have experience with stuff like that...
    I will definately think about using Memoize and see if I can implement it.
    Thanks again.
      I need to know how some information at the bottom of the file before I decide which rule set to use

      Maybe you can use File::ReadBackwards to find the information you need, then jump back to the start of the file and read forwards. If memory isn't an issue, then it may not matter, but anytime I see a file that large being slurped for a linear scan, I wonder if it could be done line by line instead.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://736593]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (10)
As of 2014-12-17 21:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (34 votes), past polls