|Keep It Simple, Stupid|
Huge files manipulationby klashxx (Initiate)
|on Nov 10, 2008 at 11:55 UTC||Need Help??|
klashxx has asked for the wisdom of the Perl Monks concerning the following question:
Hi , i need a fast way to delete duplicates entrys from very hugefiles ( >2 Gbs ) , these files are in plain text. ..To clarify, this is the structure of the file:
30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|01|F|0207|00|||+0005655,00|||+0000000000000,00 30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|01|F|0207|00|||+0000000000000,00|||+0000000000000,00 30xx|4150010003502043|CARDS|20081031|MP415001|00000024265698|01|F|1804|00|||+0000000000000,00|||+0000000000000,00
Having a key formed by the first 7 fields i want to print or delete only the duplicates( the delimiter is the pipe..).
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (out of memory!)
In using HP-UX large servers.
I 'm very new to perl, but i read somewhere tha Tie::File module can handle very large files , i tried but cannot get the right code...
Any advice will be very well come.
Thank you in advance.
PD:I do not want to split the files.