Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: How can I improve the efficiency of this very intensive code?

by sk (Curate)
on Aug 06, 2005 at 21:23 UTC ( #481552=note: print w/replies, xml ) Need Help??

in reply to How can I improve the efficiency of this very intensive code?

I feel the use of Hash might not be required for your task. You have recordID which can act as an index to an array so why put them in a hash and mess up the order? You get linear access in array using index anyways an no overhead of the hash-table

That said, i would do a Matrix(nxn) (square not a requirement, dimensions might change based on num of records of course) to keep track of scores. Consider the following table

The values inside the cell are the scores. Now if you want the best matching score (max value) then a O(n) max will provide you the answer for your records and you have to do that n-times for each record in your first file.

Sorting to finx max/min is an overkill. I might be missing your porblem so please correct me if i am wrong.



  • Comment on Re: How can I improve the efficiency of this very intensive code?

Replies are listed 'Best First'.
Re^2: How can I improve the efficiency of this very intensive code?
by clearcache (Beadle) on Aug 06, 2005 at 21:41 UTC

    I was thinking about the use of ids are pretty big numbers so I wouldn't use them alone as array indices. I could always use $., however, when I read in the file rather than the id.

    My ranking is based on # of seconds from last log entry in one file to first log entry in the second file. So I create scoring by looking at # of seconds between each record. My ability to identify a "strong match" comes from the rate of concurrent users in the application that my data comes from. Low concurrent users, I'll have lots of strong matches - records that clearly line up. If I have high concurrent users with lots of log file entries, then I've got to get a little creative.

    I was sorting b/c my hash is being used to store # of elapsed seconds...not a true "rank" in terms of 1, 2, 3, etc.

    I'm considering the use of arrays, but don't want to lose the elapsed seconds as data quite yet b/c that will be used in the next step to figure out the best match from the remaining data.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://481552]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2020-05-25 01:48 GMT
Find Nodes?
    Voting Booth?
    If programming languages were movie genres, Perl would be:

    Results (143 votes). Check out past polls.