|Perl Monk, Perl Meditation|
Best way to find patterns in csv file?by punch_card_don (Curate)
|on Nov 30, 2004 at 20:31 UTC||Need Help??|
punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:
I have a text file that looks like this:
record_id, datum_1, datum_2, ... , datum_30;
where record_id is integer, and 99% of datums are integers, although about 1% may be text or decimals. Datums can be null. There are 1.5-million records.
Then I have a collection of about 35,000 patterns I have to search for. That is, find all records that have, for example, datum_1 = x and datum_8 = y and datum_20 = z, regardless of what might be in other columns. A single record may contain several patterns, so each line has to be searched for each pattern
I realize this is just mimicing the functionality of a database, (select record_id from theTable where datum_1 = x and datum_8 = y and datum_20 = z) but I was wondering if there's a very efficient way of doing this directly on the file without setting up a database and without scanning 1.5-million lines 35,000 times (I wonder how long 50-billion line scans would take?). I've thought about this most of the day and come up with nothing promising....