Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Searching a text file

by monoxide (Beadle)
on Nov 14, 2004 at 02:03 UTC ( #407648=perlquestion: print w/replies, xml ) Need Help??

monoxide has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to search a text file by reading the text file into my script, then tokenise the file, then search the token for $search. For reference there are 7 elements in the @database array.
sub search { # Load the flatfile into a data structure. open(DB, $datafile); {local $/; $_ = <DB>} close(DB); # Start printing template print $templatehead; # Get ready for interetting datafile $recnum = 0; RECORD: while (m/\G(.*?(?<!\\)(\\\\)*)\Q$dataseperator\E/gs) { # Get the next field and unescape it. $data = $1; $data =~ m/^\n?(.*?)\n?\\?$/s; $data = $1; $data =~ s@\\\\@\\@; push @data, $1; if ($#data >= $#database) { # Get what we are searching for.... the reason we are here. $search = &getparam("search"); #gets a CGI::param() foreach $field (@data) { if(-1 != index($field,$search)) { &printfield(@data); @data = (); $recnum++; next RECORD; } } } } print $templatetail; exit(); }
An example of the data in $datafile is,
Test|Subject|test@subject.com|Male|Perl-Database|good|on| My|Tester|my@tester.com|Female|Perl-Database|very good|on| Blank|Worker|blank@worker.com|Male|MySql-Database|the best!|off| Someone|Somewhere|someone@somewhere.com|Female|Mysql-Database|great|of +f| Adding|Test|add@test.com|Male|Perl-Database|Great product!|on|

Replies are listed 'Best First'.
Re: Searching a text file
by Zaxo (Archbishop) on Nov 14, 2004 at 02:10 UTC

    I'd approach this with DBI and one of the CSV drivers like DBD::CSV. That will save you a lot of code, and leave most of your search in the hands of DBI.

    After Compline,
    Zaxo

      Thanks Zaxo, but how would i write an SQL query that would let me search for a string in any field in the table? also i am not such a big fan of DBD::CSV.

        Why would you *want* to search for a string in any field? You last field looks like a boolean (on off) your 4th field looks boolean too (male female). If you are searching for emails you would not really want to look there....

        Presumably you have a data entry form. This can also be a search form, just add a button that says search. When a user enters data in one or more fields and clicks search all you do is make a query like:

        $sth = $dbh->prepare('SELECT foo,bar,baz FROM widget WHERE foo=? AND b +ar=? AND baz=?'); $sth->execute($foo,$bar,$baz);

        where you dynamically construct the WHERE clause depending on how many fields the user enters data into. To search *all* the fields for the one string the syntax would simply be OR, not AND and include all the fields you want to search.....

        $sth = $dbh->prepare('SELECT foo,bar,baz FROM widget WHERE foo=? OR ba +r=? OR baz=?'); $sth->execute($str,$str,$str);

        DBD::SQLite might suit you better.

        cheers

        tachyon

        If you want to search for a string in any field, just concatenate all your fields into one field and write an SQL which does a "LIKE" search (with appropriate "%" before and after the search term) on that single field.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Searching a text file (avoid slurping if you can)
by grinder (Bishop) on Nov 14, 2004 at 11:54 UTC

    You are slurping the entire file into memory before processing it. This uses more memory than necessary. You only need to read one line at a time from the file. Something like:

    while( <DB> ) { chomp; RECORD: while (m/\G(.*?(?<!\\)(\\\\)*)\Q$dataseperator\E/gs) { ... } }

    Now it appears that you seem to be jumping through all sorts of hoops because you want to deal with multiline fields (the last-but-one field?).

    If that's the case, you should structure the file differently, to take advantage of Perl's strengths. For instance, you could end each record with a special token, like %% (taking care to escape out %% appearing in the fields as data: e.g. \%\%).

    Once you have the datafile in that format, you can set the input line separator to '%%', which will simplify your code considerably.

    The routine also appears to be relying on a number of external variables: $dataseperator, $templatetail, $datafile and the like. Basic code hygiene would suggest that you pass these variables in as parameters.

    Finally, the routine appears to be doing too much. Not only is it performing a search, it is also printing out stuff, and (horrors!) calling exit to end the program.

    A better architecture would have the search routine only performing a search. The printing should be hoisted up a level into the calling code (even if that itself is another routine) and the exit call should be placed at the highest level of the code tree. (Either that, or rename the routine print_header_search_results_and_footer_then_exit).

    At least people will then have fair warning of what happens when they call the routine. Or for the maintenance programmer reading the code a few years later. In fact, especially for the maintenance programmer reading the code a few years later.

    - another intruder with the mooring of the heart of the Perl

Re: Searching a text file
by TedPride (Priest) on Nov 15, 2004 at 07:29 UTC
    Given that format, why not just do something like the following?
    use strict; my $str = 'MySql'; while (<DATA>) { next if index ($_, $str) == -1; split(/\|/); for (0..6) { print "String \"$str\" found in field $_ of line " . ($.-1) . +"\n" if index($_[$_], $str) != -1; } } __DATA__ Test|Subject|test@subject.com|Male|Perl-Database|good|on| My|Tester|my@tester.com|Female|Perl-Database|very good|on| Blank|Worker|blank@worker.com|Male|MySql-Database|the best!|off| Someone|Somewhere|someone@somewhere.com|Female|Mysql-Database|great|of +f| Adding|Test|add@test.com|Male|Perl-Database|Great product!|on|
    You use the find data in a different way, of course, but the theory applies just the same. No point going through all the trouble of converting your fields if the string you're looking for isn't in the current line, and you shouldn't read the file all in at once unless you know it's always going to be small (and in that case, a simple open-read-close is most efficient).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://407648]
Approved by atcroft
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2021-03-02 11:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favorite kind of desktop background is:











    Results (42 votes). Check out past polls.

    Notices?