Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Reducing application footprint: large text files

by johngg (Canon)
on Feb 28, 2018 at 22:53 UTC ( [id://1210104]=note: print w/replies, xml ) Need Help??


in reply to Reducing application footprint: large text files

Difficult to tell without seeing some more data; for instance, are those fields consistent in number or do they vary, what are the max & min values of the hex numbers? If the fields vary to the extent that would make pack templates impractical you might want to have a look at the core Storable module, perhaps in conjunction with IO::Compress::Bzip2 and its Uncompress sibling.

Cheers,

JohnGG

  • Comment on Re: Reducing application footprint: large text files

Replies are listed 'Best First'.
Re^2: Reducing application footprint: large text files
by swl (Parson) on Feb 28, 2018 at 23:58 UTC
      Very interesting. Sereal deserves some attention too. I'll read through that.

      Thanks, Matt.

Re^2: Reducing application footprint: large text files
by Anonymous Monk on Mar 01, 2018 at 00:58 UTC
    There are two data structures that remain the same... the first structure describe bit-fields within a 64-bit register. The 2nd structure describes some meta-attributes about the register.

    min to max will be 0 to 2^64 - 1.

    So given these data structures are not varying, it is sounding like pack templates might be the way to go. Perhaps there will be a challenge in that the 1st data structure is an array with varying numbers of elements, although the structure will always be the same.

    Thanks for pointing out Storable and BZip2. That is more food for thought along the way.

    Thanks, Matt.

      There are two data structures that remain the same... the first structure describe bit-fields within a 64-bit register. The 2nd structure describes some meta-attributes about the register. min to max will be 0 to 2^64 - 1. So given these data structures are not varying, it is sounding like pack templates might be the way to go. Perhaps there will be a challenge in that the 1st data structure is an array with varying numbers of elements, although the structure will always be the same.

      The OP shows two hashes; one of which is a hash of arrays. Above you say "the 1st data structure is an array with varying numbers of elements,"? The OP mentions "many 10's of MB of computer generated data files" and shows two small data structures. My point is that you are not giving us clear information. If you want actual help rather than speculative possibilities, you need to be more clear and accurate in the specifications of the problem.

      Ie. Is this two files containing a huge version of one of the OP data structures in each? Or are the myriad files for each type of data structure? Or myriad files containing the two versions of the OP data structures?

      • How many MBs?
      • Spread across how many files?
      • Are the sub data structures fixed or variable in length?

        Note: If the top level entity in a file has a variable length, that's easily accommodated; but if the sub structures vary in length that's harder. Ie. if the hash of arrays, contains a variable number of hash elements, but the values are fixed length arrays, that easily handled; but if the arrays vary in length that's much harder.

      • Does the application need to load all of the "10s of MBs" at once for every run, or does it only use a small subset for each run?
      • So many more questions, before I would choose an approach to solving your problem.

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1210104]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-19 21:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found