http://www.perlmonks.org?node_id=1001794

Sishanth has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a hash that contains 1000's of words. I have a string eg $str = 'Hi this is the sample string for string search'

I need to search the string in the hash. And the match is not the entire string match, it is like hash may contain only 'Hi' or 'Hi this is' or 'sample string'. My perl should match all the combination of the given string with the hash. Any idea for this.

Replies are listed 'Best First'.
Re: String Search
by BrowserUk (Patriarch) on Nov 01, 2012 at 11:09 UTC

    Just iterate the combinations of words from the string:

    #! perl -slw use strict; # my %hash = ...; my $str = 'Hi this is the sample string for string search'; my @words = split ' ', $str; for my $start ( 0 .. $#words - 1 ) { for my $end ( $start .. $#words ) { print "Lookup: ", join ' ', @words[ $start .. $end ]; } } __END__ C:\test>1001794 Lookup: Hi Lookup: Hi this Lookup: Hi this is Lookup: Hi this is the Lookup: Hi this is the sample Lookup: Hi this is the sample string Lookup: Hi this is the sample string for Lookup: Hi this is the sample string for string Lookup: Hi this is the sample string for string search Lookup: this Lookup: this is Lookup: this is the Lookup: this is the sample Lookup: this is the sample string Lookup: this is the sample string for Lookup: this is the sample string for string Lookup: this is the sample string for string search Lookup: is Lookup: is the Lookup: is the sample Lookup: is the sample string Lookup: is the sample string for Lookup: is the sample string for string Lookup: is the sample string for string search Lookup: the Lookup: the sample Lookup: the sample string Lookup: the sample string for Lookup: the sample string for string Lookup: the sample string for string search Lookup: sample Lookup: sample string Lookup: sample string for Lookup: sample string for string Lookup: sample string for string search Lookup: string Lookup: string for Lookup: string for string Lookup: string for string search Lookup: for Lookup: for string Lookup: for string search Lookup: string Lookup: string search

    Where words (or word combinations) appear twice in the string, they will be looked up twice, but that will be faster than de-duplicating the combinations.

    Whether that is a problem will depend on whether you consider the same word or phrase appearing in different places duplicates or not; and what you are doing with the information you are generating.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      The previous answer shows how to start to address the problem raised, but I get the impression that the question which has been asked is the wrong question.

      Does the hash contain these 'thousands of words' as a key or a value? It would have been useful if the original question could have contained a small hash showing the data layout.

      Perhaps the real question should explain why you have such a hash and why you're looking for string matches?

        By posting your (perfectly valid) point as a reply to me, rather than the OP, you have probably ensured that he will never see it.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong