Semantically Tagged glosses
Word forms from the definitions ("glosses") in WordNet's synsets are manually linked to the context-appropriate sense in WordNet. Thus, the glosses are a sense-disambiguated corpus and WordNet version 3.0 is the dictionary against which the corpus was annotated.
Release Contents
This release, once extracted, is comprised of three subdirectories:| /WordNet-3.0/glosstag/merged | WordNet glosses in merged format |
| /WordNet-3.0/glosstag/standoff | WordNet glosses in standoff format |
| /WordNet-3.0/glosstag/dtd | DTD describing the markup for the merged annotations |
When using this freely available resource, we ask that you refer to it as the "Princeton WordNet Gloss Corpus."
Readme
Readme File
Download
Statistics
Tokenized text (word and collocation forms)
Types 47334 Tokens 1621129
Multi-word forms (globs)
man 7168 auto 45967 all 53135
Taggable lemmas (potential lemmas)
Types 55561 Tokens 1504077
Sense tags (sense keys on sense tags)
Kind Types Tokens man 33862 339969 auto 26139 118856 all 59250 458825
Taggable tokens (word forms and globs)
Kind wf glob all man 317812 12687 330499 auto 82238 36618 118856 un 202881 3830 206711 ignore 457502 0 457502
Key
wf word form man manually-inserted sense tag or collocation auto automatically generated sense tag or collocation un taggable item that has not been tagged ignore stoplist item glob collocation/multi-word term
Disclaimer
While standoff annotations have many benefits, particularly the ability to isolate annotations of choice, it is not a well-supported format. Our standoff encoding is based heavily on the ANC format, but is not identical to it as our markup is necessarily different. Therefore, some tools that work with the ANC data may work with ours, but not all. We are supplying the data in this format as a service to users who are used to working with standoff annotations, and who will build or modify existing software to work with it. We are not supporting the ANC standoff annotation format, nor any software that uses or manipulates it, nor are we providing any tools ourselves. The standoff annotations do not contain more, or better, information than the merged files. The annotations contained in them are identical to the merged data, just reformulated in a different way. If you have any doubts about which format to use, then use the merged files.
Acknowledgment
This work was sponsored by ARDA/DTO through the AQUAINT Program.
WordNet 3.0 © Princeton University 2006