Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Algorithm To Select Lines Based On Attributes

by xdg (Monsignor)
on Jan 15, 2009 at 17:13 UTC ( #736593=note: print w/replies, xml ) Need Help??


in reply to Algorithm To Select Lines Based On Attributes

Some quick thoughts:

  • #1: Use Devel::NYTProf or another profiler to see what the actual hot-spots in your code are!

  • Read lines from disk one at a time rather than slurping into @lines

  • Consider defining rules as subroutines acting on an argument and then use Memoize to cache results (assuming attributes re-occur frequently)

  • If you're re-running this against the same set of lines and rules frequently, cache the rule test results in a file or DB so you have DEFECTID and a list of rules it matches.

  • Perhaps reorganize the rules (if you can): $hash->{RULETYPE}->{RULENUMBER} = value. Then iterate the list of rules for each attribute, rather than (as you have it), iterating the attributes for each rule. I think that saves a lot of if ( defined $rulelist->{$rulenum}->{REGION} ) comparisons.

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Replies are listed 'Best First'.
Re^2: Algorithm To Select Lines Based On Attributes
by ~~David~~ (Hermit) on Jan 15, 2009 at 18:06 UTC
    Thanks for the suggestions. I have one question about bullet #2:
    I need to read to the end of the file before I enter this subroutine because I need to know how some information at the bottom of the file before I decide which rule set to use. I figured it would be better to cache that DEFECTLIST into memory rather than re-reading the file again. Is that best? Or, is there someway I could store the position in the file of the beginning and the end of the defect list, and always ensure that all characters between it are the DEFECTLIST? I don't have experience with stuff like that...
    I will definately think about using Memoize and see if I can implement it.
    Thanks again.
      I need to know how some information at the bottom of the file before I decide which rule set to use

      Maybe you can use File::ReadBackwards to find the information you need, then jump back to the start of the file and read forwards. If memory isn't an issue, then it may not matter, but anytime I see a file that large being slurped for a linear scan, I wonder if it could be done line by line instead.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://736593]
help
Chatterbox?
[Lady_Aleena]: I hope you have a wonderful day Discipulus. 8)
[Corion]: A good morning to you too ;)
[choroba]: Good morning, Monks!
[Corion]: And a good daypart to you as well, Lady_Aleena !
[robby_dobby]: Hey monkeys, do you ever get the realization that you're all waking up to chaos? I suppose not :P
[Lady_Aleena]: Good morning Corion, I hope you have a wonderful day as well. 8)
[Corion]: Yesterday I encountered an interesting data structure problem. I have a remote program that emits events, and my client listens for these events with one-shot callbacks, that is, I register the callback and if the event gets generated that callback ...
[Lady_Aleena]: robby_dobby, every day. Chaos is my life with few controls.
[Corion]: ... gets called once. The data structure for that is just a hash of arrays, mapping the event type to a queue of registered one-shots, and the first one-shot from the queue gets removed and called.
[Corion]: But now I want to register a one-shot for two events, of which only one will arrive, so my data structure doesn't work anymore...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (10)
As of 2017-05-29 07:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?