|Problems? Is your data what you think it is?|
mapping coordinates- suggestion neededby baxy77bax (Deacon)
|on Oct 14, 2010 at 16:05 UTC||Need Help??|
baxy77bax has asked for the
wisdom of the Perl Monks concerning the following question:
Hi ppl, i need some clever suggestion for the following problem. what i have is a set of coordinates with some , let say user names associated to it:
on the other side i have another set of coordinates
what i'm trying to figure out is which users at specific intervals ran a specific job. so im trying to map job id's intervals to uname intervals. where rules are that even if only part of the job_id_interval crosses the uname interval, this should be reported. the thing, is there are over 20 million of such intervals(intervals overlap) in each group and the size of allowed interval in both cases is the same and spans from 1 to 20 million.
now what i was thinking about is to, using a Bit::Vector libraries, create vector field and map the both coordinated on the the vector space, then see where they overlap and just remove the non-overlapping fields. but then it hit me how will i track down which unames and job_ids those overlaps belong to. then i thought about hashing. but how will i the find a coordinate in my hash key that is less then X
i mean i would need to sort hash keys and the loop through them to find hash keys that are >= to some start key(id) and <= to some stop key(id).
and now i'm stuck and crying to you for help.
so let me summarize my problem : i need to map if possible job_id's intervals to uname intervals and preserve >uname1 >job_id2 tags. keep in mind that those datasets have piled up over the years and are quite large. so some simple loop within a loop would not be a good solution
max for the coordinates in both cases is 20000000
this is a fraction from the real data set but don't worry about that since. the dataset, as i said large, and i cannot by hand pick real representative data to illustrate the problem
and these are unames :
as i said this is probably not a a good example for the problem illustration so please do refer to the example above :)