Here you will find the answers to many commonly asked questions about WordNet. Subtopics are found in the menu to the left.
Using WordNet
-
-
WordNet is organized by the concept of synonym sets (synsets), groups of words that are roughly synonymous in a given context. The glossary definition and the example sentences are shared among all synonyms in a given synset. This is why you'll find, for example, in the definitional gloss for "insure" the example sentence: "This nest egg will ensure a nice retirement for us".
-
-
For instructions on how to set an environment variable in Windows, please see: Google search for: site:microsoft.com Windows environment variables or a page about environment variables
The variables you'll want to set for WordNet are:
Name Value Example (Linux/*nix) Example (Windows) WNHOME WordNet's home directory /usr/local/WordNet-3.0 C:\Program Files\WordNet\2.1 WNSEARCHDIR WordNet's dict directory /usr/local/WordNet-3.0/dict C:\Program Files\WordNet\2.1\dict
Database
-
-
The WordNet database is stored in an ASCII format consisting of eight files, two for each syntactic category. Additional files are used by the WordNet search code but are not strictly part of the database. All WordNet file formats are described in Section 5 of the WordNet manual. The page wmdb(5) describes the format of the database files.
-
-
WordNet is organized by the concept of synonym sets (synsets), groups of words that are roughly synonymous in a given context. The glossary definition and the example sentences are shared among all synonyms in a given synset. This is why you'll find, for example, in the definitional gloss for "insure" the example sentence: "This nest egg will ensure a nice retirement for us".
-
-
Our lexicographers write them.
-
-
From the foreword to WordNet: An Electronic Lexical Database, pp. xviii-xix:
People sometimes ask, "Where did you get your words?" We began in 1985 with the words in Kučera and Francis's Standard Corpus of Present-Day Edited English (familiarly known as the Brown Corpus), principally because they provided frequencies for the different parts of speech. We were well launched into that list when Henry Kučera warned us that, although he and Francis owned the Brown Corpus, the syntactic tagging data had been sold to Houghton Mifflin. We therefore dropped our plan to use their frequency counts (in 1988 Richard Beckwith developed a polysemy index that we use instead). We also incorporated all the adjectives pairs that Charles Osgood had used to develop the semantic differential. And since synonyms were critically important to us, we looked words up in various thesauruses: for example, Laurence Urdang's little "Basic Book of Synonyms and Antonyms" (1978), Urdang's revision of Rodale's "The Synonym Finder" (1978), and Robert Chapman's 4th edition of "Roget's International Thesaurus" (1977) -- in such works, one word quickly leads on to others. Late in 1986 we received a list of words compiled by Fred Chang at the Naval Personnel Research and Development Center, which we compared with our own list; we were dismayed to find only 15% overlap.
So Chang's list became input. And in 1993 we obtained the list of 39,143 words that Ralph Grishman and his colleagues at New York University included in their common lexicon, COMLEX; this time we were dismayed that WordNet contained only 74% of the COMLEX words. But that list, too, became input. In short, a variety of sources have contributed; we were not well disciplined in building our vocabulary. The fact is that the English lexicon is very large, and we were lucky that our sponsors were patient with us as we slowly crawled up the mountain.
-
-
No. The morphological component of the WordNet library is unidirectional. Along with a set of irregular forms (e.g. children - child), it uses a sequence of simple rules, stripping common English endings until it finds a word form present in WordNet. Furthermore, it assumes its input is a valid inflected form. So, it will take "childes" to "child", even though "childes" is not a word.
-
-
WordNet only contains "open-class words": nouns, verbs, adjectives, and adverbs. Thus, excluded words include determiners, prepositions, pronouns, conjunctions, and particles.
-
-
See above
-
-
WordNet is an ontology with just one top node for nouns, 'entity'. Other entries in the noun.Tops file are high level entries in the ontology.
-
-
Please, tell us.
Installation
-
-
This is a problem with InstallShield, we think. For now, the workaround is to move the installer WordNet-2.1.exe into an empty folder and try again.
-
-
The error should be similar to the following:
/bin/install: `wnutil.3WN.html' and `/usr/local/WordNet-3.0/doc/html/wnutil.3WN.html' are the same file make[3]: *** [install-htmlDATA] Error 1
make[3]: Leaving directory `/usr/local/WordNet-3.0/doc/html
make[2]: *** [install-am] Error 2
make[2]: Leaving directory `/usr/local/WordNet-3.0/doc/html'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/usr/local/WordNet-3.0/doc'
make: *** [install-recursive] Error 1The build process is intended to be run from a temporary directory which is different from the directory to which WordNet will be installed. So, don't extract WordNet-3.0.tar.bz2 to /usr/local/WordNet-3.0 if you intend to install it to the default location. Instead, extract it to a temporary location (e.g. your home directory). Once WordNet is installed successfully, you can remove the directory.
-
-
If you receive the error :
wishwn: error while loading shared libraries: libtk.so.0: cannot open shared object file: No such file or directory
you need to create some symbolic links- some Linux distributions no longer have the links for backward compatibility that are necessary for running WordNet. You need to set up the appropriate links. The commands will be similiar to:
cd /usr/lib
ln -s libtk.so libtk.so.0
ln -s libtcl.so libtcl.so.0Note that the first argument to each ln command may require a version number. (e.g. ln -s libtk84.so libtk.so.0)
-
-
For the browser to function properly, it must know where you installed WordNet. The installer is supposed to set up some environment variables which tell the browser where to find the WordNet files. If these variables are not set properly, it will by default try: %PROGRAM%\WordNet\(version) (where %PROGRAM% is C:\Program Files on U.S. English systems.) The variables that need to be set are (assuming you install to D:\Other\WordNet)
WNHOME = D:\Other\WordNet
WNSEARCHDIR = D:\Other\WordNet\dict -
-
You may need to set your PATH variable to include WordNet's /bin directory. (also see: (I installed to a non-default location, and get error messages.) and (How do I set an environment variable?))
For Application Developers
-
-
The (ASCII) database format is well-documented. See WordNet documentation index, specifically WordNet man page: wndb.5WN.
-
-
WordNet provides a C API to use WordNet from a C program. The API documentation is available online and is distributed with the main WordNet packages. Interfaces for many other languages are available via our related projects page.
-
-
In addition to the word being searched for, the query string contains parameters which specify the display options and changes to the default display options. It also contains parameters that describe the level of detail to use for each synset or relation listed in the results.In a query like this (when you search for the word "quintillion")
-
The 's' parameter is the word being searched for.
-
'sub' reports which submit button was clicked i.e. Search WordNet or Change. It is left empty if nothing was clicked, maybe one of the hyperlinks was clicked instead.
-
The 'o1', 'o2' etc. are variables for the display options. By default o0 (Example Sentences) and o1 (Glosses) are set to 1 and the others aren't set.
-
The 'c' parameter specifies the option that should be toggled. Choosing a display option from the drop-down list sets the c parameter to the correct value.
History list
-
The 'h' parameter is a history list (a list of digits [0-3]* like 10000). Each digit corresponds to a synset or relation hyperlink in the results. A 1, 2 or 3 indicates that the the synset or relation is expanded, while a 0 indicates otherwise.
- The 'i' parameter is the last index to be expanded in the history list, in addition to what is already specified using h.
- And 'j' is the index in the history list to jump to.
For example: http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=trope&i=4&h=10000#ch=10000 will display the five items associated with the word trope with the first list item (the synset S: (n) trope, figure of speech, figure...) expanded and the next four compressed. The i=4 says that the item with index 4 (the relation derivationally related form) should be expanded as well.Clicking the derivationally related form link will, in turn, load a new page (http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=trope&h=10000&j=4#c) where this item is collapsed (j=4 is the index to jump to).Recursive search
-
The 'r' or recursive search parameter can be 1 or not set at all.
For example clicking on the full hyponym option on http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=trope&i=0&h=0#c will load a new page displaying all the hyponyms of the synset, as well as the hyponyms' hyponyms. This is because r is set to 1 in the new URL. Removing the r parameter or the 1 will collapse the second level of hyponyms.
The order of the parameters doesn't matter. -
For Linguistics
-
-
WordNet senses are ordered using sparse data from semantically tagged text. The order of the senses is given simply so that some of the most common uses are listed above others (and those for which there is no data are randomly ordered). The sense numbers and ordering of senses in WordNet should be considered random for research purposes.
-
-
Frequency counts are based on the number of senses a word has.
-
-
One method is to use Ted Pederson's open source Perl module, WordNet::Similarity. It provides a number of measures of semantic similarity and semantic relatedness based on WordNet. Given two synsets, it will return a numeric score showing their degree of similarity/relatedness according to various measures that all rely on WordNet in different ways.
Known Problems
-
-
This was accidentally left in the Windows port!
Other
-
-
No. The morphological component of the WordNet library is unidirectional. Along with a set of irregular forms (e.g. children - child), it uses a sequence of simple rules, stripping common English endings until it finds a word form present in WordNet. Furthermore, it assumes its input is a valid inflected form. So, it will take "childes" to "child", even though "childes" is not a word.
-
-
Go here