in reply to search a large text file
I put together an example in case you want to use PostgreSQL:
The file I used is available here:
ftp://ftp.ncbi.nih.gov/genbank/livelists
It's similar to yours; but it has three columns.
I unzipped it, and put it into postgres, in a table t; there are more than 223-million rows.
$ ls -lh GbAccList.0206.2011 -rw-rw-r-- 1 aardvark aardvark 4.6G Feb 8 17:21 GbAccList.0206.2011 $ head -n 3 GbAccList.0206.2011 AACY024124353,1,129566152 AACY024124495,1,129566175 AACY024124494,1,129566176 $ time < GbAccList.0206.2011 psql -qc " create table t (c text, i1 integer, i2 integer); copy t from stdin csv delimiter E',';" real 3m47.448s $ time echo " create index t_i2_idx on t (i2); analyze t;" | psql -q real 5m50.291s
Searches are now around a tenth of a millisecond:
# 5 'random' searches like: echo "explain analyze select * from t where i2 = $gi;" | psql
Just showing the timings of five searches:
Index Cond: (i2 = 2017697) Total runtime: 0.157 ms Index Cond: (i2 = 6895719) Total runtime: 0.109 ms Index Cond: (i2 = 3193323) Total runtime: 0.119 ms Index Cond: (i2 = 8319666) Total runtime: 0.091 ms Index Cond: (i2 = 1573171) Total runtime: 0.119 ms
Of course, performance depends on the hardware used.
(a similar problem/solution here: Re^3: sorting very large text files (slander))
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: search a large text file
by BrowserUk (Patriarch) on Feb 08, 2011 at 17:35 UTC |
In Section
Seekers of Perl Wisdom