Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
laziness, impatience, and hubris
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hi. I'm glad to see it's not just me that's been thinking along these lines :)

Not that I've come to any real conclusions, but I think that it's all just set theory. And the way to describe it may be those terms. i.e. sets, unions, intersections and complements.

If all the records fit in memory in a hash it's reasonably easy to describe a set that passes some test function

my @set_1 = grep { func($_) } keys %records_1;

then you can describe the relationship you're looking for in those terms.

So you might end up with something like this:-

set1 = set of all records_1 that pass func() set2 = set of all records_2 that fail func() results = intersection of set1 and set2; set3 = ... etc
and so you can describe any arbitrary combination of sets.

We will needs some support functions , but that's 'just a simple matter of programming' ;)

The main problem, as I see it , is how to deal with data sets that are too big to fit into memory. My only thought is to keep a hash of the (key, file offset) and re-parse each record on demand. Or maybe Tie::File could do the job ?

Arrgh! - you're making me want to try code this again :)

R.

In reply to Re: Thinking out loud (was: Re^2: The Eternal "filter.pl") by RichardK
in thread The Eternal "filter.pl" by Voronich

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (6)
    As of 2014-04-20 12:53 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      April first is:







      Results (485 votes), past polls