in reply to advice needed for processing largish data in multiple files
I have no idea what the OP is trying to do here. Perhaps a simplified example of input / output data might be helpful? In general, though, if you don't have enough memory space, the best thing to do is process a section of your data at a time, which in this case means only doing a smaller range of the phone numbers on each pass. Just do x number of passes through all the files, ignoring the data that corresponds to phone numbers outside of the current range, dump the results into a sequence of files, and merge those files into a single file. Perhaps not the most efficient way to solve the problem, but rather simple to implement.