Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Do you know where your variables are?
 
PerlMonks  

Re: Analysing five years of blogging

by biosysadmin (Deacon)
on Nov 03, 2004 at 01:33 UTC ( #404808=note: print w/ replies, xml ) Need Help??


in reply to Analysing five years of blogging

Being a BioGeek, you must surely know about Markov Chains. Perhaps you could use that volume of writing to generate transition state statistics from word to word and see how well it performs at writing articles similar to Tom.

Other options include simple word frequency counts, and possibly analysis of the domains to which he links (just a simple frequency count based on the second-level domain name might be interesting).

Unfortunately I'm working on my thesis, so I don't have spare time to play with more data. Best of luck with the project. :)


Comment on Re: Analysing five years of blogging
Re^2: Analysing five years of blogging
by perlcapt (Pilgrim) on Nov 03, 2004 at 13:12 UTC
    I went to Wikipedia to find out about Markov Chains. Being neither a mathematician nor BioInformationcist, I could not see any apparent application of Markov Chains and readability. It looked to me that they are more useful in modeling than analysis.

    I'm not asking you to write up an example of how they could be used in this context, but just to elaborate (for us non-scholars).

    I did find The Gunning Fog Index, which looks as though it might be a reasonable method of establishing a readability index. That article mentions Flesch Algorithms for scoring reading levels of material. Are there any Perl Monks with a background in educational pyschology?

    P.S. This is too interesting a topic to lose in a flame war.

    Update:

    These later index methods are available in the CPAN Module Lingua::EN::Fathom. I haven't had a chance to even read the whole POD, but it mentions Fog, Flesch and Kincaid indices.
    perlcapt
    -ben

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://404808]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2014-04-20 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (488 votes), past polls