Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: 15 billion row text file and row deletes - Best Practice?

by imp (Priest)
on Dec 01, 2006 at 05:46 UTC ( #587126=note: print w/ replies, xml ) Need Help??


in reply to 15 billion row text file and row deletes - Best Practice?

If you have to work with the text file then I would recommend using sed instead of perl.

# If your version of sed supports editing in place sed -i -e '/^00020123837$/d' somefile.txt #Otherwise sed -e '/^00020123837$/d' somefile.txt > tmp.txt mv tmp.txt somefile.txt
If this is done regularly or other maintenance work is going to be done then a database becomes a much more attractive option.


Comment on Re: 15 billion row text file and row deletes - Best Practice?
Download Code
Re^2: 15 billion row text file and row deletes - Best Practice?
by graff (Chancellor) on Dec 01, 2006 at 06:08 UTC
    And bear in mind that sed supports the use of an "edit script" file -- one could take a list of patterns that should be deleted from a file, and turn that into an edit script. Based on the OP's description, the list of serial numbers to kill could be saved as:
    /0001234/d /0004567/d /0089123/d ...
    If that's in a file called "kill.list", then just run sed like this:
    sed -f kill.list big.file > tmp.copy mv tmp.copy big.file
    On a file of a few hundred GB, I agree that using sed for this very simple sort of editing would be a significant win (save a lot of run time).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://587126]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2015-07-04 13:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls