Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: 3-byte representation

by TomDLux (Vicar)
on Oct 18, 2011 at 15:10 UTC ( #932171=note: print w/replies, xml ) Need Help??

in reply to 3-byte representation

How many numbers do you have in a file? How many files? How many files can you store on a $65 2-TB drive?

The basic premise of Unix is to store data as text, if at all possible. This makes it simple to process it using utilities you hadn't considered when the file was created.

Your numbers fit in 24 bits, so +/- 8,366,608 ... in fact possibly a smaller range, since you suggest adding a constant to shift the numbers to all-positive. If the numbers are evenly distributed, storing as text requires a separator plus 1-7 digits, plus a possible minus sign. That works out to an average of 5 bytes per number, if you're using ASCII. If there is any asymmetry to make small values more likely than large ones, it might be better than 5 bytes. For a small loss you are now able to feed your files through grep, dc, tr, sed, awk, perl.

As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Replies are listed 'Best First'.
Re^2: 3-byte representation
by gerleu (Novice) on Oct 19, 2011 at 07:52 UTC
    Hello TomDLux and thank for your answer ! I've several thousands couples of numbers (in fact latitudes and longitudes, at first with 4 decimal precision. but converted to integers during the creation of a file). Physical storage place is not an issue, only memory space if the fastest parsing solution is memory intensive, because many different files will be processed by many different users at the same time......

      Considering a process is allocated megabytes of memory when it runs, worrying about a few thousand kilobytes of wasted space is not really productive.

      Wasting memory or wasting CPU resources is never good, of course, but correctness is the first priority. Once you have a solution that works correctly, tighten up memory use, IF it presents a problem in the number of instances you can run; tighten up the algorithmns involved, IF there's a problem with run time.

      No problem? Go on to the next task.

      As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://932171]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2017-05-25 22:39 GMT
Find Nodes?
    Voting Booth?