In this report, the design of the interface of The Reader as well as the technical issues behind the implementation of The Reader are discussed as a model for the design of similar sophisticated yet intuitive on-line aids.
The Reader began as a project named The WIRE -- WordNet in a Reading Environment. It was initially designed by Daniel Markham and Patricia Gildea as a UNIX-based program. This program was designed as a proof of concept, the concept of integrating WordNet into an educational environment to allow students access to the powerful capabilities of WordNet in an efficient and, most importantly, easy to use manner. Over time, the name of the program was changed to "The Reader" and the project was continued by Joshua Schechter who, under the guidance of George Miller and others at the Cognitive Science Laboratory, redesigned and rewrote the program, taking it beyond the proof of concept stage. The Reader is still being developed, and it continually becomes more and more intuitive, powerful, and efficient.
The current version of the Reader, version 2.2, is platform independent (i.e. it works on UNIX, MS Windows, and MacOS machines). It is written in Tcl/Tk and C. The version described here is the latest version, as of November 1997.
The Pager takes these concordances, and outputs a set of files, Reader files, for use with The Reader program proper. The Reader files are another form of semantic concordance, specially designed for use with The Reader. The Reader files represent the text, typeset into distinct page images, as well as contain all of the information of a ConText file and further extensive indexing information. The Reader file format has been carefully designed to most usefully and efficiently represent the information needed for the sophisticated navigation and searching features of The Reader interface.
The Reader proper is the interface between the students and the on-line text. It is what allows a student to interact with the information stored in the Reader files. It displays the texts as does a book, one page at a time, and allows simple book-like navigation through the text. It enables the student to retrieve sense information for each semantically-tagged word, as well as interact with WordNet for further semantic-based queries. The Reader also contains sophisticated navigation and searching features, allowing a student to easily move throughout the text, as well as perform contextual sense-based searches. It is designed to be easy-to-use, allowing students to access computational sophisticated features in an intuitive and educationally helpful manner.
All design decisions reflect the application of these principles.
These principles directly affected the design of the look and feel of the interface. It was decided that the most obvious and easy-to use interface for representing textual information was the book, and thus, possessing a consistently book-like interface would make the Reader maximally easy-to use for students. Thus, the main window of The Reader exhibits a page of text. It shows about twenty-five lines of text along with the page number and arrows to allow the student to "turn the page" to the next or previous page of data. Simply reading the text can be done in a completely obvious and trivial way, without any interference from the advanced features of Reader. Similarly, the basic task of looking up the sense of the word is effected by simply pointing the cursor and clicking on the word in question.
It was decided to design The Reader in Tcl/Tk and C. Tcl/Tk is a language specifically designed for rapid interface design. Tcl is a cross-platform scripting language in nature, which enables the rapid construction and testing of applications. It is also easily extensible through the addition of commands written by the programmer in C. Tk is the visual interface component to Tcl/Tk and allows the visual display of text, use of buttons, and other visual "widgets" in an application. The use of Tcl/Tk allowed the easy construction, extension, and debugging of The Reader, as well as provide a consistent and aesthetic visual look.
The C language was used for all computationally-expensive algorithmic processes needed for the Reader. C is an efficient, widely known, and cross-platform language. Its use allowed The Reader to be fast and efficient, without any long delays during the processing of information needed for search and word-sense queries.
The internal structure of The Reader was also designed in a very modular way, to allow easy extension. Commands are grouped logically based on function (navigation, searching, word-lookup, etc.) This allows the addition and evolution of functionality as new software modules can be added and old-ones can be modified very easily with all changes local to a module.
On any page of text, the reader displays about twenty-five lines. At the first page of each chapter, it displays a chapter header. In the upper left corner of the page, here are arrows for going a page forward or backwards in the text, as well as the current page number. On the upper left corner is a simple graphic which when pressed emulates the look a a page turned down, so a student can book-mark the current page (and go back to it using the "Move" window. If a student clicks on any content word, the word will highlight, and a box appears showing the definition, from WordNet of the word in context. This box, the "Lookup Box" also contains a button for enabling the display of all senses of the word in question, so the student can see which senses of the word are not being meant here. The "Lookup Box" will also display specially prepared comments on sections of text, primarily explaining metaphors, which are incorporated into the Reader files as a pedagogical tool.
Flanking the top and bottom of the page of text are several buttons. On the top, from left to right are the "Quit" button which enables the quick exit of the program. Then there is the "Move" button which calls up the "Move" window. There is the "Find" button which calls up the "Find" window. Next to it is the "Unclutter" button which removes all but the main window. Finally, there is the "About" button which shows basic information about The Reader program.
On the bottom of the page are three buttons. The "Lookup Word" button brings up a version on the WordNet browser for easy exploration of the lexicon. The "Tags" button toggles the state of the text to and from highlighting all of the content words for easy notification of which words have senses associated with them. Words with a single sense in context are highlighted in blue, while the few words with multiple sense in context are highlighted in red (gray and black on gray-scale monitors). Finally, the "Comments" button toggles the underlining of words with comments associated with them.
In the "Find" window, once a word is typed in, and a form of search is selected, the program will display to the user the number of successful matches to the word as well as the number of the closest match after the current location in the text. The main window will automatically update to display this match, and the "Lookup Box" will update to show its sense in context. From the "Search" window, the user can now go forward and backwards through the matching words, using the "Next" and "Previous" buttons. He may also perform searches on new words, or new types of searches.
There are five Reader files which correspond to a text. There is the catalog file, the comment file, the data file, the index file and the titlepage file. Each filename is in seven-three form for cross-compatibility reasons. For example, the five files for The Red Badge of Courage are rbc.cat, rbc.cmt, rbc.dat, rbc.idx, and rbc.ttl.
adopted % 68 2 46 % 154 22 37The catalog file is in alphabetical order. It is used quite extensively to allow the numerous search features of The Reader to be carried out quickly and efficiently.
10 13 28 - 10 13 39The comment file is used for the the display and easy insertion of comments into the text. Teachers can use it to provide students with more points to ponder or questions to answer.
{chapter 1}
{page 1}
2 6 <%>
2 10 <%1:07:00::>
2 15
2 22 <%4:02:00::>
2 34 <%>
2 39 <%>
2 43 <%1:17:01::*1:17:00::>
2 48 <,> <%>
2 50 <%>
3 0 <%>
3 4 <%5:00:00:retreating:00>
3 13
3 18
3 27 <%>
3 30 <%1:14:00::>
3 35
3 45 <%>
3 49 <%>
4 0 <%>
4 4 <%1:17:00::>
4 9 <,> <%>
4 11
4 18 <.> <%>
This file speeds up the The Reader quite considerably. It allows The Reader to quickly get to any page or chapter of the book, without reading through all of the preceding pages. It also allows the program to quickly find all of the comments on that page, without needing to look through all of the preceding comments.
{title}
The Red Badge of Courage
{author}
Stephen Crane
{publish}
Originally Published 1895
{info}
Semantic concordance prepared by the
Cognitive Science Laboratory, Princeton University
Copyright 1995, 1996 Princeton University
{end}
The C code is also modularized. All functions are grouped according to the type of job they perform -- global/book processing, page processing, comment processing, word lookup, book-marking, WordNet communications, as well as several other which are more specialized. These C functions are compiled into a Tcl interpreter which runs the Tcl/Tk modules discussed above.
Th only difference with the use of the program on different platforms is the way in which the user starts it. On UNIX, a written command will start up a shell script which will start the Tcl interpreter. On an IBM-compatible running windows, a shortcut runs a batch file which starts the interpreter. MacOs will be similar, although its start-up mechanism has yet to be designed.
The second major addition to The Reader which would be very useful is to allow its network use. Changing the Reader to conform to a client-server model would enable there to be a centralized repository of texts (for each school, for example) and would enable the large-scale unified collection of statistical data for Reader usage, for research uses. Such a system would also allow greater control of the use of texts by teachers and researchers. A preliminary version for Reader file queries has already been written in C. A client program has yet to be implemented, however.