Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: 15 billion row text file and row deletes - Best Practice?

by imp (Priest)
on Dec 01, 2006 at 05:46 UTC ( #587126=note: print w/ replies, xml ) Need Help??


in reply to 15 billion row text file and row deletes - Best Practice?

If you have to work with the text file then I would recommend using sed instead of perl.

# If your version of sed supports editing in place sed -i -e '/^00020123837$/d' somefile.txt #Otherwise sed -e '/^00020123837$/d' somefile.txt > tmp.txt mv tmp.txt somefile.txt
If this is done regularly or other maintenance work is going to be done then a database becomes a much more attractive option.


Comment on Re: 15 billion row text file and row deletes - Best Practice?
Download Code
Re^2: 15 billion row text file and row deletes - Best Practice?
by graff (Chancellor) on Dec 01, 2006 at 06:08 UTC
    And bear in mind that sed supports the use of an "edit script" file -- one could take a list of patterns that should be deleted from a file, and turn that into an edit script. Based on the OP's description, the list of serial numbers to kill could be saved as:
    /0001234/d /0004567/d /0089123/d ...
    If that's in a file called "kill.list", then just run sed like this:
    sed -f kill.list big.file > tmp.copy mv tmp.copy big.file
    On a file of a few hundred GB, I agree that using sed for this very simple sort of editing would be a significant win (save a lot of run time).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://587126]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2014-07-26 05:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls