Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

How can I use regex to copy each word of a string into a hash?

by cei (Monk)
on Jan 16, 2001 at 06:50 UTC ( [id://52139]=perlquestion: print w/replies, xml ) Need Help??

cei has asked for the wisdom of the Perl Monks concerning the following question:

Say I want to use a regex, instead of split, to build a hash of words in a string. Maybe I want to use /(\w['\w-]*)/g as my word finder. $string is my string. How do I iterate through $string using the regex to build the hash?

Replies are listed 'Best First'.
Re: How can I use regex to copy each word of a string into a hash?
by Ovid (Cardinal) on Jan 16, 2001 at 07:12 UTC
    Are you sure you don't mean an array? How would you break the words up into keys and values? If you just want to stuff values into the keys of the hash (with '1' as the value), the basic form looks like this:
    my $sentence = 'This is a test'; my %hash = map { $_, 1 } ($sentence =~ /\w+/g);
    If you need a count, you could do this:
    my $sentence = 'This is a test'; my %hash; for ($sentence =~ /\w+/g) { $hash{ $_ }++; }
      A frequency count is exactly the direction I was headed. Perl Cookbook gave an example for "Processing Every Word in a File", but it wasn't clear to me how to apply it to a string instead of a file.

      thanks

Re: How can I use regex to copy each word of a string into a hash?
by I0 (Priest) on Jan 16, 2001 at 11:43 UTC
    I'm not quite sure what you're asking for. Did you want something like this?
    @hash{$string =~ /(\w['\w-]*)/g} = ();
Re: How can I use regex to copy each word of a string into a hash?
by ColonelPanic (Friar) on Jan 16, 2001 at 18:29 UTC
    Keep in mind that split will generally be a more efficient solution, whenever it is possible.

    When's the last time you used duct tape on a duct? --Larry Wall
      True, but if I'm trying to index english text, the oddities of punctuation may make a split impractical. Unless you've got a suggestion on an iterative split?
        If you just want words, why bother with punctuation at all? Just do a massive s/\W+/ /g on the string beforehand and you'll get a big list of words, separated by spaces. I suppose those damn apostrophies will cause you pain, and you want "it's" to differ from "its". It's unclear whether or not capitalization matters-- is "BASIC" a different word from "basic"? What about "Smith" versus "smith"?

        Anyway, I'd probably write something like this:
        local $/ = undef; $_ = <MY_FILE>; my %hash = (); $hash{$_}++ foreach split /[^\w']+/; # Change $_ to lc if case matters


        -Ted
How can I use "How can I use regex to copy each word of a string into a hash?"?
by frankus (Priest) on Jan 16, 2001 at 16:05 UTC

    I'm really boring. I take bits of code for my repository; this looks useful for repeated word searching in a string*, by using a cache.

    I'm off to write functionality (and benchmark) so the first search uses regex: subsequent searches the hash.

     

    *Provided your only need to test for existence.

    --
    
    Brother Frankus.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://52139]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 11:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found