Searching a text file

monoxide has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to search a text file by reading the text file into my script, then tokenise the file, then search the token for $search. For reference there are 7 elements in the @database array.

sub search
{
  # Load the flatfile into a data structure.
  open(DB, $datafile); 
  {local $/; $_ = <DB>}
  close(DB);
  # Start printing template
  print $templatehead;
  # Get ready for interetting datafile
  $recnum = 0;
  RECORD: while (m/\G(.*?(?<!\\)(\\\\)*)\Q$dataseperator\E/gs) 
  {
    # Get the next field and unescape it.
    $data = $1;
    $data =~ m/^\n?(.*?)\n?\\?$/s; 
    $data = $1;
    $data =~ s@\\\\@\\@;
    push @data, $1;
    if ($#data >= $#database)
    {
      # Get what we are searching for.... the reason we are here.
      $search = &getparam("search"); #gets a CGI::param()
      foreach $field (@data)
      {
        if(-1 != index($field,$search))
        {
          &printfield(@data);
          @data = ();
          $recnum++;
          next RECORD;
        }
      }
    }
  }
  print $templatetail;
  exit();
}
[download]

An example of the data in $datafile is,

Test|Subject|test@subject.com|Male|Perl-Database|good|on|
My|Tester|my@tester.com|Female|Perl-Database|very good|on|
Blank|Worker|blank@worker.com|Male|MySql-Database|the best!|off|
Someone|Somewhere|someone@somewhere.com|Female|Mysql-Database|great|of
+f|
Adding|Test|add@test.com|Male|Perl-Database|Great product!|on|
[download]

monoxide
I am undecided...

Comment on Searching a text file Select or Download Code

Replies are listed 'Best First'.
Re: Searching a text file by Zaxo (Archbishop) on Nov 14, 2004 at 02:10 UTC
I'd approach this with DBI and one of the CSV drivers like DBD::CSV. That will save you a lot of code, and leave most of your search in the hands of DBI. After Compline, Zaxo	[reply]
Re^2: Searching a text file by monoxide (Beadle) on Nov 14, 2004 at 03:03 UTC
Thanks Zaxo, but how would i write an SQL query that would let me search for a string in any field in the table? also i am not such a big fan of DBD::CSV. monoxide I am undecided...	[reply]
Re^3: Searching a text file by tachyon (Chancellor) on Nov 14, 2004 at 09:18 UTC
Why would you want to search for a string in any field? You last field looks like a boolean (on off) your 4th field looks boolean too (male female). If you are searching for emails you would not really want to look there.... Presumably you have a data entry form. This can also be a search form, just add a button that says search. When a user enters data in one or more fields and clicks search all you do is make a query like: `$sth = $dbh->prepare('SELECT foo,bar,baz FROM widget WHERE foo=? AND b +ar=? AND baz=?'); $sth->execute($foo,$bar,$baz);` [download] where you dynamically construct the WHERE clause depending on how many fields the user enters data into. To search all the fields for the one string the syntax would simply be OR, not AND and include all the fields you want to search..... `$sth = $dbh->prepare('SELECT foo,bar,baz FROM widget WHERE foo=? OR ba +r=? OR baz=?'); $sth->execute($str,$str,$str);` [download] DBD::SQLite might suit you better. cheers tachyon	[reply] [d/l] [select]
Re^3: Searching a text file by CountZero (Bishop) on Nov 14, 2004 at 11:34 UTC
If you want to search for a string in any field, just concatenate all your fields into one field and write an SQL which does a "LIKE" search (with appropriate "%" before and after the search term) on that single field. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: Searching a text file (avoid slurping if you can) by grinder (Bishop) on Nov 14, 2004 at 11:54 UTC
You are slurping the entire file into memory before processing it. This uses more memory than necessary. You only need to read one line at a time from the file. Something like: `while( <DB> ) { chomp; RECORD: while (m/\G(.?(?<!\\)(\\\\))\Q$dataseperator\E/gs) { ... } }` [download] Now it appears that you seem to be jumping through all sorts of hoops because you want to deal with multiline fields (the last-but-one field?). If that's the case, you should structure the file differently, to take advantage of Perl's strengths. For instance, you could end each record with a special token, like `%%` (taking care to escape out %% appearing in the fields as data: e.g. `\%\%`). Once you have the datafile in that format, you can set the input line separator to '%%', which will simplify your code considerably. The routine also appears to be relying on a number of external variables: `$dataseperator`, `$templatetail`, `$datafile` and the like. Basic code hygiene would suggest that you pass these variables in as parameters. Finally, the routine appears to be doing too much. Not only is it performing a search, it is also printing out stuff, and (horrors!) calling exit to end the program. A better architecture would have the `search` routine only performing a search. The printing should be hoisted up a level into the calling code (even if that itself is another routine) and the exit call should be placed at the highest level of the code tree. (Either that, or rename the routine `print_header_search_results_and_footer_then_exit`). At least people will then have fair warning of what happens when they call the routine. Or for the maintenance programmer reading the code a few years later. In fact, especially for the maintenance programmer reading the code a few years later. - another intruder with the mooring of the heart of the Perl	[reply] [d/l]
Re: Searching a text file by TedPride (Priest) on Nov 15, 2004 at 07:29 UTC
Given that format, why not just do something like the following? use strict; my $str = 'MySql'; while (<DATA>) { next if index ($_, $str) == -1; split(/\\|/); for (0..6) { print "String \"$str\" found in field $_ of line " . ($.-1) . +"\n" if index($_[$_], $str) != -1; } } __DATA__ Test\|Subject\|test@subject.com\|Male\|Perl-Database\|good\|on\| My\|Tester\|my@tester.com\|Female\|Perl-Database\|very good\|on\| Blank\|Worker\|blank@worker.com\|Male\|MySql-Database\|the best!\|off\| Someone\|Somewhere\|someone@somewhere.com\|Female\|Mysql-Database\|great\|of +f\| Adding\|Test\|add@test.com\|Male\|Perl-Database\|Great product!\|on\| [download] You use the find data in a different way, of course, but the theory applies just the same. No point going through all the trouble of converting your fields if the string you're looking for isn't in the current line, and you shouldn't read the file all in at once unless you know it's always going to be small (and in that case, a simple open-read-close is most efficient).	[reply] [d/l]


There's more than one way to do things
	PerlMonks