Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?)

by Bod (Deacon)
on Jun 14, 2021 at 12:01 UTC ( #11133841=note: print w/replies, xml ) Need Help??


in reply to Re^3: How to count the vocabulary of an author?
in thread How to count the vocabulary of an author?

Well, I have a PhD in mathematical linguistics

Wow! - Genuinely impressive.

Can I ask your opinion on Hemingway Editor?
I use it extensively in producing content for our business marketing, blogs, etc. But I have started writing something to perform a similar task but more tailored to our needs. For example, in marketing the ratio of first person to second person pronouns is (thought to be) important. My version makes extensive use of Lingua::EN::Fathom.

My attempt is not very far developed and I'd love some informed input before I go much further.

  • Comment on OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?)

Replies are listed 'Best First'.
Re: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?)
by choroba (Archbishop) on Jun 14, 2021 at 13:42 UTC
    In fact, the idea is craftily clever. Their stemmer and parser can only stem and parse simple sentences, so if it can't process the sentence with a sufficiently high certainty, they flag it as too complex :-)

    I don't know what technology they use in the editor. Also, I quit academia almost ten years ago, so things might have moved a bit since I worked on similar stuff.

    But generally, English is one of the easier languages to process. Its morphology is simple (almost no declension, simple conjugation) and the training data for statistical methods are huge.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Ever tried saying it to an average English speaker?

        The lack of grammar keeps related phrases closer to each other which helps parsing a lot.

        For free word order languages, grammar seems to help, but due to homonymy (or homography) you usually don't have a solid foundation to base the grammar on.

        The most advanced system nowadays are based on Machine Learning, so there's no grammar involved at all, you just need large training data.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      I don't know what technology they use in the editor

      Javascript apparently...
      There is an explanation here

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11133841]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2021-07-26 16:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?