Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

RegEx for users who dont know RegEx

by dimar (Curate)
on Dec 23, 2004 at 17:07 UTC ( #417149=perlquestion: print w/replies, xml ) Need Help??
dimar has asked for the wisdom of the Perl Monks concerning the following question:

Goal: There is a simple script that searches through flat text files based on a simple 'query string.' The query string is supplied by the user. The goal is to allow the user to supply 'wildcards' in her query string. The syntax for queries is very minimal (i.e., dumb flat search for a string with no fancy boolean operators, no filtering capabilities, no letter-case distinctions).

Question:What is the best way to process a user-supplied search 'query' such that a user is allowed to include simple wildcard characters. (eg not full-blown RegEx) For example, if the user supplies:

i*ation
The script should return all matching items in the flat file:
invitation information Isolation InFlaTiOn IATION
But the script should not return:
In our nation it requires concentration fiat ionizing

Is the best approach simply to take the user input and translate that into a full-blown RegEx, and then use that? Devising a RegEx that meets the requirement is no problem, but it seems like a potential red-flag to be post processing a user-supplied into a RegEx for some reason. Perhaps there is a better way to approach this.


Update -- modified example positive match that was intended to represent a false-positive match; re: The Mad Hatter

Replies are listed 'Best First'.
Re: RegEx for users who dont know RegEx
by ikegami (Pope) on Dec 23, 2004 at 17:23 UTC

    Something like this?

    ($reg_exp = $user_query) =~ s/(\W)/$1 eq '*' ? "\\S*" : "\\$1"/ge;

    It can be used as follows:

    # File as an array of lines: @matching_lines = grep { /$reg_exp/ } @lines_to_search;

    or

    # File in a scalar: @matching_lines = $file =~ /^(.*${reg_exp}.*)$/mg;
      ($reg_exp = $user_query) =~ s/(\W)/$1 eq '*' ? "\\S*" : "\\\$1"/ge;
      You have too many slashes in the second part of that ternary (results in invalid chars being replaced by a literal '\$1').
        Thanks. Fixed. Tested.
Re: RegEx for users who dont know RegEx
by The Mad Hatter (Priest) on Dec 23, 2004 at 17:18 UTC
    One solution is to strip out all characters in the user-supplied data that aren't explicitly allowed and then generate your regex based off of that.

    Update -- Try this sample code:

    #!/usr/bin/perl use warnings; use strict; my $query = shift; die "usage: $0 query-string\n" if not $query; print "Original query: '$query'\n"; $query =~ s/[^\w\*]//g; print "Safe query: '$query'\n"; $query =~ s/\*/\\w\*/g; print "Parsed query: '$query'\n"; while (<DATA>) { print "match: $_" if /$query/i; } __DATA__ invitation information Isolation InFlaTiOn IATION In our nation it requires concentration at ionizing radiation
    Note that "at ionizing radiation" matches because the iation in radiation matches. Did you just miss that, or should it not match?

      Why strip out the unsafe characters instead of escaping them, as I did below? Using your approach to search for "can't" will fail, for example.
        *shrugs* Stripping out unsafe characters just seems like a better idea; you know exactly what you're left with. If you want to allow single ticks or other characters, modifying the regex of allowed chars is easy. The code above is just an example, not a final product.
Re: RegEx for users who dont know RegEx
by inman (Curate) on Dec 24, 2004 at 12:29 UTC
    Just quotemeta the search string then substitute any characters that you want to use as wildcards. I have used * and ?

    #! /usr/bin/perl -w use strict; use warnings; $/ = undef; my $text = <DATA>; my $count = 1; my $searchRegex = quotemeta $ARGV[0];; $searchRegex =~ s/\\\*/\\S*/g; $searchRegex =~ s/\\\?/\\S{1}/g; print "Search RegEx = $searchRegex\n"; $text =~ s/($searchRegex)\b/print $count++, " $1\n"/ige; __DATA__ can't won't This is a selection of example passages to find We received an invitation from Mark to his party He provided information on how to get to his house. He has been in the isolation ward for much of the year due to some unfortunate bladder InFlaTiOn. Let's hope he gets better! IATION

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://417149]
Approved by Joost
help
Chatterbox?
and the radiator hisses contentedly...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2017-12-17 11:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (464 votes). Check out past polls.

    Notices?