Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re: Processing while reading in input

by AnomalousMonk (Chancellor)
on Sep 20, 2018 at 00:29 UTC ( #1222683=note: print w/replies, xml ) Need Help??

in reply to Processing while reading in input

In your example input, all the clusters occur contiguously, i.e., all Osat_a members (just the one), then all the Atha_b members, all Fves_d members, etc. Is this the case in your real data, or might you have data like, e.g.,

Osat_a Osat_a # just one cluster member Atha_b Atha_b # >1 cluster member, this & next line = 2 members Fves_d Fves_d # this & next 2 lines = 3 cluster members Osat_h Osat_h Atha_b Mtru_c Fves_d Osat_e Atha_g Atha_g # just 1 cluster member Fves_d Atha_f Osat_h Atha_i ... ...
where cluster members are promiscuously mingled?

If the former case (all cluster members contiguous) is true, processing of very large files is easy: just buffer all cluster members until you detect the transition from one cluster member to another, then write out all buffered cluster members. This could scale to millions of cluster members.

In the latter case, something like LanX's suggestion seems the way to go.

Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Processing while reading in input
by onlyIDleft (Scribe) on Sep 20, 2018 at 02:37 UTC

    The input is ordered contiguously. You are correct in your observation

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1222683]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (11)
As of 2019-07-16 16:15 GMT
Find Nodes?
    Voting Booth?

    No recent polls found