Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Estimating Vocabulary

by belg4mit (Prior)
on Mar 27, 2002 at 03:36 UTC ( #154575=note: print w/replies, xml ) Need Help??

in reply to Estimating Vocabulary

Well I suppose that depends on your defintion of word. am, are, is, was - are these each words? Also IIRC the English language is purported to have a lexicon on the order of 320,000 words*. The average American vocabulary has been in steady decline since the early twentieth century at which point I believe it was on the order of several thousand words*. A few things to consider:
  • dictionaries may contain archaic forms
  • does your dictionary contain proper nouns? do you care?
  • the content of the language is not evenly distributed across the lexicon, e.g. a single word (sans modifiers) for "love" and a plethora for shades of blue.
  • * I shall attempt to find evidence to support this. An enlightening thread, but then again it is usenet... Apparently this is a pretty hotly contested topic.

    perl -pe "s/\b;([st])/'\1/mg"

    Replies are listed 'Best First'.
    Re: Re: Estimating Vocabulary
    by YuckFoo (Abbot) on Mar 27, 2002 at 04:06 UTC
      Good points all, belg4mit.

      * If the sample is large enough, the correct percentage of archaic words will be in the sample, it'll work itself out.

      * I had already removed proper nouns, nouns containing any uppercase letter. I should have noted that, but again I'm not sure it matters with a large enough sample.

      * I'm not sure how words should really be counted, still looking for a reference myself. For my purpose, I am considering run, runs, ran, running as unique words.

      I'm just looking for a ballpark number. It seems like a good ballpark to me that if the boy consistently knows 20-25% of the words in the sample, he should know 20-25% of the words in $DICT.

      If anyone has pointers to real vocabulary development numbers and counting methods, I'd like to get'em.


    Log In?

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: note [id://154575]
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (7)
    As of 2019-11-14 16:23 GMT
    Find Nodes?
      Voting Booth?
      Strict and warnings: which comes first?

      Results (80 votes). Check out past polls.