Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

matching the words

by venky4289 (Novice)
on Feb 16, 2013 at 19:28 UTC ( #1019063=perlquestion: print w/ replies, xml ) Need Help??
venky4289 has asked for the wisdom of the Perl Monks concerning the following question:

HI Actually i have a file of dictionary words and one more file where the words are not totally written ex: fil_ ... Now we need to search in the dictionary fike to find the exact word which matches the word in the place of _ in the word. Thanks in andvance

Comment on matching the words
Re: matching the words
by roboticus (Canon) on Feb 16, 2013 at 19:49 UTC

    venky4289:

    What part are you having difficulty with?

    If you haven't started at all, I'd suggest converting the words with underscores in them into regular expressions, and then searching through your dictionary.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I have stored all the dictionary words into an array. . And all the words i need to search into another array. .. For example @array1=("fil_" , "t_xt"); words need to be filled @array2=("file","text","fils") words in dictionary file Now i need to match the elements of array1 with array2 and i should get the output as fil_ : file,fils T_xt : text Thanks for ur replies

        Yes, that helped! Here's one option that substitutes the "_" with ".+" for use in a regex (use "." if you want only one letter to match):

        use strict; use warnings; my @array1 = qw (fil_ t_xt _erl); my @array2 = qw (Merlin file text fils perl filled); for my $stem (@array1) { my $re = $stem; $re =~ s/_/.+/; /\b$re\b/ and print "$stem: $_\n" for @array2; }

        Output:

        fil_: file fil_: fils fil_: filled t_xt: text _erl: perl

        Only whole words are matched, as word boundires (\b) are used in the regex which, if omitted, will also match substrings within the dictionary words.

        Hope this helps!

        Update: Below is an updated version which adapts BrowserUk's preferred solution:

        use strict; use warnings; my @array1 = qw (fil_ t_xt _erl); my @array2 = qw (Merlin file text fils perl filled); my $words = join ' ', @array2; for my $stem (@array1) { my $re = $stem; $re =~ s/_/./; print "$stem: $1\n" while $words =~ /\b($re)\b/g; }

        Output:

        fil_: file fil_: fils t_xt: text _erl: perl

        If you want the "_" to be matched by more than one letter in the dictionary words, change the substitution to $re =~ s/_/\\S+/;.

Re: matching the words
by Kenosis (Priest) on Feb 16, 2013 at 19:50 UTC
    • How are the words stored in the dictionary file? Any data samples you can share?
    • How do the "_" 'words' occur in the other file(s)?
    • How do you expect to "... find the exact word which matches ..." if, e.g., multiple words in the dictionary file begin with "fil"?
    • What have you tried? Do you have any code samples you can share?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1019063]
Approved by ansh batra
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-08-29 10:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (280 votes), past polls