Database
Q. What is the format of the WordNet database?
A. The WordNet database is stored in an ASCII format consisting of eight files, two for each syntactic category. Additional files are used by the WordNet search code but are not strictly part of the database. All WordNet file formats are described in Section 5 of the WordNet manual. The page wmdb(5) describes the format of the database files.
Q: The word I searched for doesn't appear in its example sentence.
A. WordNet is organized by the concept of synonym sets (synsets), groups of words that are roughly synonymous in a given context. The glossary definition and the example sentences are shared among all synonyms in a given synset. This is why you'll find, for example, in the definitional gloss for "insure" the example sentence: "This nest egg will ensure a nice retirement for us".
Q. Where do you get the definitions for WordNet? (short answer)
A. Our lexicographers write them.
Q. Where do you get the definitions for WordNet? (long answer)
A. From the foreword to WordNet: An Electronic Lexical Database, pp. xviii-xix:
People sometimes ask, "Where did you get your words?" We began in 1985 with the words in Kučera and Francis's Standard Corpus of Present-Day Edited English (familiarly known as the Brown Corpus), principally because they provided frequencies for the different parts of speech. We were well launched into that list when Henry Kučera warned us that, although he and Francis owned the Brown Corpus, the syntactic tagging data had been sold to Houghton Mifflin. We therefore dropped our plan to use their frequency counts (in 1988 Richard Beckwith developed a polysemy index that we use instead). We also incorporated all the adjectives pairs that Charles Osgood had used to develop the semantic differential. And since synonyms were critically important to us, we looked words up in various thesauruses: for example, Laurence Urdang's little "Basic Book of Synonyms and Antonyms" (1978), Urdang's revision of Rodale's "The Synonym Finder" (1978), and Robert Chapman's 4th edition of "Roget's International Thesaurus" (1977) -- in such works, one word quickly leads on to others. Late in 1986 we received a list of words compiled by Fred Chang at the Naval Personnel Research and Development Center, which we compared with our own list; we were dismayed to find only 15% overlap.
So Chang's list became input. And in 1993 we obtained the list of 39,143 words that Ralph Grishman and his colleagues at New York University included in their common lexicon, COMLEX; this time we were dismayed that WordNet contained only 74% of the COMLEX words. But that list, too, became input. In short, a variety of sources have contributed; we were not well disciplined in building our vocabulary. The fact is that the English lexicon is very large, and we were lucky that our sponsors were patient with us as we slowly crawled up the mountain.
Q. Can WordNet generate plural forms and other inflected forms?
A. No. The morphological component of the WordNet library is unidirectional. Along with a set of irregular forms (e.g. children - child), it uses a sequence of simple rules, stripping common English endings until it finds a word form present in WordNet. Furthermore, it assumes its input is a valid inflected form. So, it will take "childes" to "child", even though "childes" is not a word.
Q. Why is WordNet missing: of, an, the, and, about, above, because, etc.
A. WordNet only contains "open-class words": nouns, verbs, adjectives, and adverbs. Thus, excluded words include determiners, prepositions, pronouns, conjunctions, and particles.
Q. Why is WordNet missing: pronouns
A. See above
Q. What are the unique beginners/what is in noun.tops?
A. WordNet is an ontology with just one top node for nouns, 'entity'. Other entries in the noun.Tops file are high level entries in the ontology.
Q. WordNet is missing: a word that is a noun, verb, adjective, or adverb.
A. Please, tell us.
