Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Count number of words on a web page?

by JSchmitz (Canon)
on Feb 09, 2010 at 13:39 UTC ( #822190=note: print w/replies, xml ) Need Help??

in reply to Count number of words on a web page?

#!/usr/bin/perl $/ = ""; $* = 1; while (<>) { #s/-\n//g; tr/A-Z/a-z/; @words = split(/\W*\s+\W*/, $_); # split into words foreach $word (@words) { $wordcount{$word}++; # count the words } } foreach $word (sort keys(%wordcount)) { printf "%8d\t\t%s\n", $wordcount{$word}, $word; }

Hope that helps


Replies are listed 'Best First'.
Re^2: Count number of words on a web page?
by cdarke (Prior) on Feb 09, 2010 at 16:26 UTC
    Use word boundaries (\b).
    $_ ='The,quick,brown;foxy. Does a lot,of:jumping!'; my @words = split(/\W*\s+\W*/, $_); # split into words print 'Number of words: '.@words."\n";
    gives 4 words. However
    my @words = split(/\b\W*/, $_); # split into words
    gives 9 words.
Re^2: Count number of words on a web page?
by jdlev (Scribe) on Feb 09, 2010 at 13:49 UTC
    I love it when a program comes together - jdhannibal

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://822190]
[Corion]: ambrus: And for Windows, I don't think that Prima knows if there still are messages queued for an object (in the Windows message loop). Finding that out would take lots of effort for little gain
[ambrus]: And even if this works, I'm still not sure you can't get double timeouts from a Timer.
[ambrus]: Corion: well Prima::Object says something like that the cleanup method will send an onDestory message and that you can't get more messages after cleanup, or something.
[Corion]: ambrus: Yeah - I don't think the deep source dive will be necessary if things are implemented as simple as they could be :)) And hopefully I won't need (more) timely object destruction. I can update the screen at 60Hz and hopefully even do HTTP ...
[Corion]: ... transfers in the background. Now that I think about it, this maybe even means that I can run the OpenGL filters on Youtube input :)
[ambrus]: Corion: I mentioned that the unix event loop of Prima always wakes up at least once every 0.2 seconds. Have you found out whether the win32 event loop of Prima does that too?
[Corion]: ambrus: Hmm - I would assume that the onDestroy message is sent from the destructor and doesn't go through the messageloop, but maybe it is sent when a window gets destroyed but all components are still alive...
[ambrus]: Corion: partly deep source dive, partly just conservative coding even if it adds an overhead.
[Corion]: ambrus: Hmm - no, I haven't looked at wakeup intervals ... I wonder why it should want to wakeup periodically because it gets a lot of messages from the Windows message loop (on Windows obviously)
[ambrus]: (Alternately a deep source dive and then rewrite that event loop to make it better, and then as a bonus you get an idle method.)

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2016-12-09 10:27 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (150 votes). Check out past polls.