Important Note: This is an unofficial fork of GNU CIDE.
To accompany the GNU version of the set of files (CIDE.*) containing
the electronic version of the
Collaborative International Dictionary of English.
(called also GCIDE)
* * * * * * * * * * * * * * * * * * * * * * * * * * * *
This document describes the GNU version of the Collaborative International Dictionary of English. It is organized into a series of chapters, introduced by headings beginning with a single asterisk. A chapter may have sections, which are marked with two asterisks. For those readers who use Emacs, this structure corresponds to its "Outline mode", which will be enabled automatically upon loading this file.
The chapter Introduction describes the structure of this package. The chapter Structure of the Dictionary describes the dictionary structure in general. An overview of the markup tags is provided in the chapter Tags. A detailed information about dictionary markup can be obtained from a set of ancillary files included in this package, which are described in the chapter Ancillary Files.
The chapter Dictionary Lookup describes how to use GNU Dico for reading this dictionary. Finally, other versions of the Webster dictionary are listed in the chapter Other Versions of the Dictionary.
The dictionary was derived from the
Webster's Revised Unabridged Dictionary
Version published 1913
by the C. & G. Merriam Co.
Springfield, Mass.
Under the direction of
Noah Porter, D.D., LL.D.
and has been supplemented with some of the definitions from
WordNet, a semantic network created by
the Cognitive Science Department
of Princeton University
under the direction of
Prof. George Miller
and is being proof-read and supplemented by volunteers from around the world. This is an unfunded project, and future enhancement of this dictionary will depend on the efforts of volunteers willing to help build this free resource into a comprehensive body of general information. New definitions for missing words or words senses and longer explanatory notes, as well as images to accompany the articles are needed. More modern illustrative quotations giving recent examples of usage of the words in their various senses will be very helpful, since most quotations in the original 1913 dictionary are now well over 100 years old.
GCIDE is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
GCIDE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this copy of GCIDE; see the file COPYING. If not, see http://www.gnu.org/licenses/.
This dictionary is maintained by Patrick Cassidy and Sergey Poznyakoff.
Please send bug reports and suggestions to gcide@gnu.org.
When the archive is unpacked, the main dictionary text of the GCIDE will be found in 26 files named "CIDE.*", where the asterisk indicates which letter of the alphabet begins the words in each file. For example, file "CIDE.B" contains words beginning with the letter "B". Additional information about the tagging conventions and special character symbols are contained in ancillary files in this directory (see below the section entitled "ANCILLARY FILES"). The main body of the 1913 dictionary was essentially identical to the edition published in 1890, and was republished in 1913 with an appendix containing "New Words". The new words of that appendix have been integrated into the main file in this version. However, it is important to keep in mind that the definitions in this dictionary are in most cases over 100 years old. Use them with caution!
At the bottom of each paragraph in this dictionary, there is a bracketed and tagged "source" indicated. This tells from where the definition or other text in that paragraph came, as follows:
[<source>1913 Webster</source>]
= From the original 1890 dictionary.
[<source>Webster 1913 Suppl.</source>]
= From the 1913 "New Words" supplement to the Webster.
[<source>WordNet 1.5</source>]
= From the WordNet on-line semantic network.
[<source>Century Dict. 1906.</source>]
= From the Century Dictionary published in 1906, especially from
the "proper Names" supplement (volume IX).
published
[<source>XXX</source>]
= Added by one of the volunteers.
The original definitions have been tagged and in some cases reformatted or slightly rearranged. If substantive information is added from a second source, usually the additional source is also noted, as in:
[<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]
This version is tagged with SGML-like tags of the form <pos>...</pos>
so
that the original typography (italics, bold, block quotes) can be
reproduced. A list of the most important tags for fields in the dictionary
is given below. The tags also serve the more important function of allowing
the information content to be conveniently imported into computer programs
or databases. The set of tags used is described in the accompanying file
"tagset.txt". NOTE the paragraph tags <p>...</p>
do not always nest
properly with certain other tags, such as <note>
and <cs>
("collocation
section"), which in some cases span multiple paragraphs. If you are using a
tag parser which detects improper nesting, you should first either delete
the paragraph tags or convert them to non-tag symbols, or, if possible, set
the parser to ignore the <p>...</p>
tags.
The unusual characters (such as Greek or the European accented characters, as well as special characters used in the pronunciations) are described in the accompanying file "webfont.txt". Some information on the pronunciation system used may be found by viewing the file "pronunc.jpg", and additional explanations of pronunciation are in the file "pronunc.txt".
Each paragraph of the original text is enclosed within tags of the form
<p>...</p>
. Within these paragraphs there are no line breaks, and some of
the paragraphs are over 12,000 characters long, which may prove too long to
be handled by some editors. At some points, embedded line breaks within a
"paragraph" are marked by a <br/
"entity". The file can therefore be
converted, if necessary, to a form with shorter lines, and subsequently
reconverted back to the form having one line per paragraph.
If additional line breaks are added, then in order to remove the line breaks and reconstruct the original paragraphs, so that the page width can be adjusted, perform the following manipulations:
(1) convert each line break to a space.
(2) convert the string "</p> " (</p> followed by two spaces)
to </p> followed by two line breaks.
(3) convert the string "<br/ " (<br/ followed by one space)
to <br/ followed by one line break.
A more sophisticated formatting of spaces within paragraphs may require the use of the fully-tagged master files. If you have a need for these files, contact Patrick Cassidy: cassidy@micra.com.
This version is only a first typing, and has numerous typographic errors, including errors in the field-marks. In addition, the user must keep in mind that this text is very old and will contain numerous obsolete, inaccurate, and perhaps offensive statements, which are included solely because this work is intended to reproduce accurately this historically interesting classic reference work. This text should not be relied upon as an accurate source of information, as in many cases it represents the state of knowledge around 1890. The text is provided "as is", and the user must accept responsibility for all consequences of its use. Please refer to the header of each file and the GNU public license. If these conditions of use are unacceptable, please do not use these texts.
This electronic dictionary is also made available as a potential starting point for development of a modern comprehensive encyclopedic dictionary, to be accessible freely on the internet, and developed by the efforts of all individuals willing to help build a large and freely available knowledge base. A large number of collaborators are needed to bring this dictionary to a more accurate, more modern, and more useful state. Anyone willing to assist in any way in constructing such a knowledge base should contact its maintainers at gcide@gnu.org.
Most important tags used in the GCIDE:
<hw> tags the headword
<pr> pronunciation
<pos> part of speech
<ety> etymology
<ets> "source" word within an <ety> field, usually foreign words
<fld> field of knowledge (e.g. Med. = medicine)
<def> definition
<cs> collocation section (containing word combinations)
<col> collocation entry (word combination)
<cd> collocation definition
<as> illustrations of usage (within a <def>. . . </def> field)
<au> authority for a definition, or author of a quotation
<q> illustrative quotation -- in block quote format
<au> author of an illustrative <q> quotation
<altname> alternative name for the headword -- essentially a synonym
<asp> alternative spelling of the headword
<syn> list of synonyms for the headword
<p> paragraph
<b> bold type
<it> italic type
For other tags, see the file "tagset.txt"
In addition to the main text of the dictionary, additional explanatory material about this version of the dictionary is available in the ancillary files:
-
COPYING
The license terms for distributing and modifying this dictionary.
-
INFO
Short information about the dictionary. It is used by GNU Dico dictionary server. The first line of this file provides a short database description. The entire file is sent as the response to SHOW INFO command.
-
abbrevn.lst
List of the abbreviations used in the dictionary.
-
authors.lst
List of authors whose works are quoted in the dictionary.
-
pronunc.txt
Description of the special markup used in this dictionary to represent pronunciations.
-
pronunc.jpg
A copy of the dictionary page describing the pronunciation symbols used in the original work.
-
symbols.jpg
This file lists original pronunciation symbols with the corresponding markup entities used in this version.
-
tagset.txt
Description of the markup tags.
-
titlepage.png
A copy of the original title page.
-
webfont.txt
Description of the special escape sequences used in this dictionary. This file also explains the Greek transliteration syntax used in it.
The dictionary home page http://gcide.gnu.org.ua provides an on-line lookup facility.
The GNU Dico project contains a browser for GCIDE, called "gcider". It provides a windowing interface allowing user to search for matching headwords using several match strategies and browse their definitions. See http://dico.gnu.org.ua/gcider.html.
The package also provides a loadable module which allows to use GCIDE with the dictionary server.
See http://www.gnu.org.ua/software/dico for more information, including links to download and documentation.
There are several other derivative versions of this dictionary on the internet, in some cases reformatted or provided with an interface. Those that I am aware of are:
This version of GCIDE is available online at the GNU Dico web site:
http://dicoweb.gnu.org.ua/?db=gcide
The site provides extensive search facilities.
In the etext96 directory of Project Gutenberg, there is a version of the original 1913 dictionary, which is in the public domain. The main files are labeled pgw050*.*. The tags for that version are a subset of those used in this GNU version.
This group has created a program to index and search this dictionary. The program can be downloaded and used locally, but at present is available only in a Unix-compatible executable version. See their web site at http://www.dict.org.
Mark Olsen and Gavin LaRowe at the University of Chicago have converted the original 1913 dictionary to HTML and have provided an interface allowing search of the headwords. When the supplemented version has developed sufficiently to warrant the effort, a similar searchable version may be posted there as well. The search page is at:
http://humanities.uchicago.edu/forms_unrest/webster.form.html
That page will provide links to other ARTFL projects and contact information for the ARTFL group, who alone can provide information about the HTML version or interface.