http://www.perlmonks.org?node_id=1019068


in reply to matching the words

venky4289:

What part are you having difficulty with?

If you haven't started at all, I'd suggest converting the words with underscores in them into regular expressions, and then searching through your dictionary.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: matching the words
by venky4289 (Novice) on Feb 16, 2013 at 20:16 UTC
    I have stored all the dictionary words into an array. . And all the words i need to search into another array. .. For example @array1=("fil_" , "t_xt"); words need to be filled @array2=("file","text","fils") words in dictionary file Now i need to match the elements of array1 with array2 and i should get the output as fil_ : file,fils T_xt : text Thanks for ur replies

      Yes, that helped! Here's one option that substitutes the "_" with ".+" for use in a regex (use "." if you want only one letter to match):

      use strict; use warnings; my @array1 = qw (fil_ t_xt _erl); my @array2 = qw (Merlin file text fils perl filled); for my $stem (@array1) { my $re = $stem; $re =~ s/_/.+/; /\b$re\b/ and print "$stem: $_\n" for @array2; }

      Output:

      fil_: file fil_: fils fil_: filled t_xt: text _erl: perl

      Only whole words are matched, as word boundires (\b) are used in the regex which, if omitted, will also match substrings within the dictionary words.

      Hope this helps!

      Update: Below is an updated version which adapts BrowserUk's preferred solution:

      use strict; use warnings; my @array1 = qw (fil_ t_xt _erl); my @array2 = qw (Merlin file text fils perl filled); my $words = join ' ', @array2; for my $stem (@array1) { my $re = $stem; $re =~ s/_/./; print "$stem: $1\n" while $words =~ /\b($re)\b/g; }

      Output:

      fil_: file fil_: fils t_xt: text _erl: perl

      If you want the "_" to be matched by more than one letter in the dictionary words, change the substitution to $re =~ s/_/\\S+/;.

        Rather than loading the real words as an array, I'd load them as a single whitespace delimited string.

        It is a couple of hundred times faster to invoke the regex engine, once, to search for one word in a string containing hundreds or thousands of words; than to invoke it hundreds or thousands of times to match against one word at a time:

        #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $words = do{ local( @ARGV, $/ ) = 'words.txt'; <> }; our @words = split ' ', $words; our $P //= 0.01; our @toLookFor = map { rand() > $P ? () : do { my $w = $_; my $p = int( rand length()-1 ); $w =~ s[.{$p}\K.][.]; $w; }; } @words; printf "Looking for %d terms amongst %d words\n", scalar @toLookFor, scalar @words; cmpthese 1, { a => q[ for my $re ( @toLookFor ) { m[^$re$] #and print "a:$re :: $_" for @words; } ], b => q[ $words =~ m[\b($_)\b] #and print "b:$_ :: $1" for @toLookFor; ], } __END__ C:\test>junk42 Looking for 1846 terms amongst 178691 words s/iter a b a 85.2 -- -95% b 3.94 2065% -- C:\test>junk42 -P=0.02 Looking for 3564 terms amongst 178691 words s/iter a b a 166 -- -95% b 7.83 2022% --

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        cmpthese