Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: search a large text file

by erix (Vicar)
on Feb 08, 2011 at 17:27 UTC ( #887014=note: print w/ replies, xml ) Need Help??


in reply to search a large text file

I put together an example in case you want to use PostgreSQL:

The file I used is available here:

ftp://ftp.ncbi.nih.gov/genbank/livelists

It's similar to yours; but it has three columns.

I unzipped it, and put it into postgres, in a table t; there are more than 223-million rows.

$ ls -lh GbAccList.0206.2011 -rw-rw-r-- 1 aardvark aardvark 4.6G Feb 8 17:21 GbAccList.0206.2011 $ head -n 3 GbAccList.0206.2011 AACY024124353,1,129566152 AACY024124495,1,129566175 AACY024124494,1,129566176 $ time < GbAccList.0206.2011 psql -qc " create table t (c text, i1 integer, i2 integer); copy t from stdin csv delimiter E',';" real 3m47.448s $ time echo " create index t_i2_idx on t (i2); analyze t;" | psql -q real 5m50.291s

Searches are now around a tenth of a millisecond:

# 5 'random' searches like: echo "explain analyze select * from t where i2 = $gi;" | psql

Just showing the timings of five searches:

Index Cond: (i2 = 2017697) Total runtime: 0.157 ms Index Cond: (i2 = 6895719) Total runtime: 0.109 ms Index Cond: (i2 = 3193323) Total runtime: 0.119 ms Index Cond: (i2 = 8319666) Total runtime: 0.091 ms Index Cond: (i2 = 1573171) Total runtime: 0.119 ms

Of course, performance depends on the hardware used.

(a similar problem/solution here: Re^3: sorting very large text files (slander))


Comment on Re: search a large text file
Select or Download Code
Re^2: search a large text file
by BrowserUk (Pope) on Feb 08, 2011 at 17:35 UTC

    Nice one again++ :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://887014]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (14)
As of 2014-09-22 13:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (191 votes), past polls