Re^6: Match typo

by vitoco (Friar)
on Sep 11, 2009 at 18:43 UTC

in reply to Re^5: Match typo
in thread Match typo

$has_publish is always false, change the pattern to /^publication/, but I think that you should also explode that pattern as in $has_date.

Also, I would change the exploded pattern to a scoring one:

my $has_date = grep /d/ + /a/ + /t/ + /e/ >= 3 , @words; my $has_publish = grep /p/+/u/+/b/+/l/+/i/+/c/+/a/+/t/+/o/+/n/ >= +8 , @words;

This would allow any spelling of the word with at least most of the letters (but would match other words from ikegami's list).

The most important part to me is: I didn't know that an implicit capture is done when matching with g option. I need to read the docs again!!!

BTW, why not to search for the following?

while(<>){ print $_ if /\d\d\-\d\d\-\d\d/; }

I know, I know... Other dates would also match. ;-)

Replies are listed 'Best First'.
Re^7: Match typo
by ikegami (Pope) on Sep 11, 2009 at 19:21 UTC

    $has_publish is always false

    No. Using the OP's very own example,

    while (<DATA>) { chomp; my @words = lc($_) =~ /[a-z-]+/g; # Needs improvement? my $has_date = grep /^d/ && /a/ && /t/ && /e/, @words; my $has_publish = grep /^publish/, @words; print "'$_' matched\n" if $has_date && $has_publish; } __DATA__ The book was published on the date "20-08-2009".
    'The book was published on the date "20-08-2009".' matched

    Also, I would change the exploded pattern to a scoring one:

    I like it, but I disagree with your decision to undo matching the first letter exactly.

    my $has_date = grep /^d/ && (/a/+/t/+/e/) == 3, @words; my $has_publish = grep /^p/ && (/u/+/b/+/l/+/i/+/s/+/h/) >= 5, @words;

      I see, this OP's example, not that one.

      And I disagree with your decision to match the first letter exactly. I think that "the book was upblished on the date" may be considered a common typo, just like "the book wa spublished on the date" or "the book wasp ublished on the date" (i.e. interchange of two consecutive keys when typing).

        I never see anything like the first. People get the first letter right. But I hadn't thought of the space misplacement.

