Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

regex: finding something followed explicitly by a dot

by rmexico (Novice)
on Feb 11, 2006 at 17:02 UTC ( #529567=perlquestion: print w/replies, xml ) Need Help??
rmexico has asked for the wisdom of the Perl Monks concerning the following question:

i'm parsing files and trying to find a given variable followed by a dot. so, the file that i'm parsing is this:
package blah.blah.blah; import blah.Log; public void addLeadType(LEAD_TYPE type) { _leadTypes.add(type); } // ** PROTECTED METHODS // ** PRIVATE METHODS // ** ACCESSORS // ** INNER CLASSES }
and the perl code that i'm using to find a word in that file:
#!/usr/bin/perl undef $/; sub fileContainsWord() { my $file = $_[0]; my $word = $_[1]; my $found = 0; open FILE, "<$file"; while(<FILE>) { my $filecontents = $_; if($filecontents =~ /$word/smg) { $found = 1; last; } } close FILE; return $found; } my $file = ""; my $word = "LEAD_TYPE."; my $flag = &fileContainsWord($file, $word); print $flag . "\n";
for some reason, it's not picking up the trailing dot in $word, and flag is coming back '1' why is this?

Replies are listed 'Best First'.
Re: regex: finding something followed explicitly by a dot
by brian_d_foy (Abbot) on Feb 11, 2006 at 17:15 UTC

    You get back 1 because you return $found from fielContainsWord.

    Remember that regex special characters in interpolated strings are still special, and a dot with the /s flag matches any character. You don't need /s unless you use the dot as a special character, and you don't need /m unless you are using the string anchors. The /g flag isn't doing you much code here either since you're only matching the string once. To quote possible special characters, you can use the \Q sequence (or quotemeta beforehand).

    if($filecontents =~ /\Q$word/) {
    brian d foy <>
    Subscribe to The Perl Review
Re: regex: finding something followed explicitly by a dot
by davido (Archbishop) on Feb 11, 2006 at 17:14 UTC

    The '.' is a special character inside of regular expressions, signifying 'anything except newline' (usually).

    You probably want to use quotemeta like this:

    my $word = quotemeta( $_[1] );

    This should 'escape' (with a leading '\' character) anything that could be interpreted as a special character.


Re: regex: finding something followed explicitly by a dot
by bart (Canon) on Feb 11, 2006 at 17:15 UTC
    You have a quotemeta problem, a dot (for one) is special in regexes. Replace
    if($filecontents =~ /$word/smg) {
    if($filecontents =~ /\Q$word/smg) {
    and now you can do literal searches.
Re: regex: finding something followed explicitly by a dot
by m.att (Pilgrim) on Feb 11, 2006 at 17:19 UTC
    When you interpolate a variable in a regular expression, regex metacharacters are evaluated as if you'd used them in the regex. So, in your example, the '.' character is evaluated as matching 'any character'. You can either escape the regex metacharacters in the word you're searching for or use the \Q\E pair to automatically quote metacharacters, such as:

    /\Q$word/E/ or /\Q$word/

    (The \E can actually be omitted, it's there so that you can enclose a subset of the regex)

Re: regex: finding something followed explicitly by a dot
by pKai (Priest) on Feb 11, 2006 at 17:27 UTC

    The content of $word is used as a regular expression string. So the dot is a wildcard for a single character.

    With $word = "LEAD_TYPE." the match is like $filecontents =~ /LEAD_TYPE./smg which matches the line in the java function declaration (dot matches space character).

    It follows that $found=1 being returned by your sub, ending up as the value of $flag

    For taking your input literally in the match see the quotemeta builtin sub or the \Q (+ \E) perlre metacharacters.

    You may also have a look into perlop to check that your usage of the s- m- and g-modifiers in the match have no influence on your result in the context you perform your match.

Re: regex: finding something followed explicitly by a dot
by ayrnieu (Beadle) on Feb 12, 2006 at 00:33 UTC
    Other people have already found your bug. I'd like show you how I'd think to write your program.
    #! /usr/bin/env perl use strict; use warnings; BEGIN { die "usage: $0 <file> <word>" unless @ARGV==2 } sub file_has_word { my ($fn, $w) = @_; open my $f, $fn or die "$fn: $!"; local $/ = undef; # but why do you want this? return unless <$f> =~ /\Q$w/; 1 } print "found it!\n" if file_has_word(@ARGV);
    Or possibly: grep 'LEAD_TYPE\.'
      Why do you have your argument-checking 'die' inside a BEGIN block? The line would work as well without being enclosed in a BEGIN block (once a ';' was added at the end)
        Habit: sometimes I'd like to avoid other BEGIN or module processing. As near-cargo-cult code goes, is it very bad?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://529567]
Approved by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2018-06-18 00:19 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (107 votes). Check out past polls.