http://www.perlmonks.org?node_id=9216


in reply to Schwartzian Transform

Fellow Seekers of Perl Wisdom, maybe I'm wrong, but I suspect the Schwartzian Transform be only good for datas that will fit in memory all at once. Given that Perl 5.6 can munge with files over 2Gb, (on systems that support files this big), can anyone suggest an elegant mod of the transform to accommodate big mother f.. files!

Replies are listed 'Best First'.
RE: RE: Schwartzian Transform
by chromatic (Archbishop) on Apr 26, 2000 at 19:01 UTC
    Use a tied array (@input) that operates on a disk file?
      No.

      First of all mixing references and arrays tied to disk without thinking carefully about it is asking for serious problems.

      Secondly passing the array to Perl's sort function is asking for very serious trouble. That will pass it all into memory!

      What I would do for large data structures would be to use DB_File to tie a hash to a BTree, and use properly formatted strings as keys. Blech. But it will work up to the maximum file size for your OS. (OK, up to about half that - BTrees waste something like 40% of the space in the tree.) Or use a proper database.

      Thanks chromatic, Despite the tie of @input, should not the list of references to lists overflow with very large files. Also should @a and @b be tied and what about the anonymous array. Is it possible to tie anonymous arrays, and if so would it, and the all the others require serialization? TIA