Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: search a large text file

by erix (Parson)
on Feb 08, 2011 at 17:27 UTC ( #887014=note: print w/replies, xml ) Need Help??

in reply to search a large text file

I put together an example in case you want to use PostgreSQL:

The file I used is available here:

It's similar to yours; but it has three columns.

I unzipped it, and put it into postgres, in a table t; there are more than 223-million rows.

$ ls -lh GbAccList.0206.2011 -rw-rw-r-- 1 aardvark aardvark 4.6G Feb 8 17:21 GbAccList.0206.2011 $ head -n 3 GbAccList.0206.2011 AACY024124353,1,129566152 AACY024124495,1,129566175 AACY024124494,1,129566176 $ time < GbAccList.0206.2011 psql -qc " create table t (c text, i1 integer, i2 integer); copy t from stdin csv delimiter E',';" real 3m47.448s $ time echo " create index t_i2_idx on t (i2); analyze t;" | psql -q real 5m50.291s

Searches are now around a tenth of a millisecond:

# 5 'random' searches like: echo "explain analyze select * from t where i2 = $gi;" | psql

Just showing the timings of five searches:

Index Cond: (i2 = 2017697) Total runtime: 0.157 ms Index Cond: (i2 = 6895719) Total runtime: 0.109 ms Index Cond: (i2 = 3193323) Total runtime: 0.119 ms Index Cond: (i2 = 8319666) Total runtime: 0.091 ms Index Cond: (i2 = 1573171) Total runtime: 0.119 ms

Of course, performance depends on the hardware used.

(a similar problem/solution here: Re^3: sorting very large text files (slander))

Replies are listed 'Best First'.
Re^2: search a large text file
by BrowserUk (Pope) on Feb 08, 2011 at 17:35 UTC

    Nice one again++ :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://887014]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2017-05-25 00:04 GMT
Find Nodes?
    Voting Booth?