


                    WORDS (LATIN) DOCUMENTATION 1.97                    


WORDS 1.97 - 
LATIN-to-ENGLISH DICTIONARY PROGRAM
--------------------------------------------------------------------------


                        WORDS (Latin) Version 1.97                        

INSTALLATION
SUMMARY
INTRODUCTION
OPERATIONAL DESCRIPTION
 Program Operation
 Examples
 Signs and Abbreviations in Meaning
BRIEF PROGRAM DESCRIPTION
 Trimming of uncommon results
 Special Cases
 Uniques
 Tricks
 Codes in Inflection Line
 Help for Parameters
 Program source code
GUIDING PHILOSOPHY
 Purpose 
 Method
 Word Meanings
 Proper Names
 Letter Conventions
 Dictionary Codes
 AGE
 AREA
 GEO
 FREQ
 SOURCE
 Dictionary Conventions
 Evolution of the Dictionary
 Testing
 Current Status and Future Plans
WRITING DICT.LOC AND UNIQUES
 DICT.LOC
 UNIQUES
DEVELOPERS AND REHOSTING
--------------------------------------------------------------------------


                               INSTALLATION                               
 


The WORDS program, with it's accompanying data files should run on any 
machine for which it is adapted, any monitor.  Simply download the 
self-extracting EXE files or the compressed file for the appropriate 
system and execute/decompress it in your chosen subdirectory on the hard 
disk, creating the necessary files.  Then call/run WORDS.  

See the particular page for each specific system.  
Intel PC Systems
DOS
Windows 95/NT/98
Linux



                                 SUMMARY                                  
 


This program, (WORDS.EXE for the PC - DOS, Windows 95/98/NT or LINUX 
console versions), takes keyboard input or a file of Latin text lines and 
provides an analysis of each word individually.  It uses an INFLECT.SEC, 
UNIQUES.LAT, ADDONS.LAT, STEMFILE.GEN, INDXFILE.GEN, and DICTFILE.GEN, and
possibly .SPE and DICT.LOC.  

The dictionary contains over 30000 entries, as would be counted in an 
ordinary dictionary.  This expands to almost twice that number of 
individual stems (the count that the program may display at startup), and,
through additional word construction with hundreds of prefixes and 
suffixes, may generate more, leading to many hundreds of thousands of 
'words' that can be formed by declension and conjugation.  This version of
WORDS provides a tool to help in translations for the Latin student.  It 
is now a large dictionary by any measure and can be helpful to advanced 
users.  The dictionary will continue to grow - slowly.  


                               INTRODUCTION                               



I am no expert in Latin, indeed my training is limited to a couple of 
years in high school 50 years ago.  But I always felt that Latin, as 
presented after two millennia, was a scientific language.  It had the 
interesting property of inflection, words were constructed in a logical 
manner.  I admired this feature, but could never remember the vocabulary 
well enough when it came time to exercise it on tests.  

I decided to automate an elementary-level Latin vocabulary list.  As a 
first stage, I produced a computer program that will analyze a Latin word 
and give the various possible interpretations (case, person, gender, 
tense, mood, etc.), within the limitations of its dictionary.  This might 
be the first step to a full parsing system, but, although just a 
development tool, it is useful by itself.  

Please remember that this is only a computer exercise in automating a 
Latin dictionary.  I am not a Latin scholar and anything in the program or
documentation is filtered by me from reading Latin dictionaries.  Please 
let no one go to his teacher and cite me as an authority.  

While developing this initial implementation, based on different sources, 
I learned (or re-learned) something that I had overlooked at the 
beginning.  Latin courses, and even very large Latin dictionaries, are put
together under very strict ground rules.  Some dictionary might be based 
exclusively on 'Classical' (200 BC - 200 AD) texts; it might have every 
word that appears in every surviving writing of Cicero, but nothing much 
before or since.  Such a dictionary will be inadequate for translating 
medieval theological or scientific texts.  In another example, one 
textbook might use Caesar as their main source of readings (my high school
texts did), while another might avoid Caesar and all military writings 
(either for pacifist reasons, or just because the author had taught Caesar
for 30 years and had grown bored with going over the same material, year 
after year).  One can imagine that the selection of words in such 
different texts would differ considerably; moreover, even with the same 
words, the meanings attached would be different.  This presents a problem 
in the development of a dictionary for general use.  

One could produce a separate dictionary for each era and application or a 
universal dictionary with tags to indicate the appropriate application and
meaning for each word.  With such a tag arrangement one would not be 
offered inappropriate or improbable interpretations.  The present system 
has such a mechanism, but it is not yet exploited.  

The Version 1.97 dictionary may be found to be of fairly general use for 
the student; it has the easy words that every text uses.  It also has a 
goodly number of adverbs, prepositions, and conjunctions, which are not as
sensitive to application as are the nouns and verbs.  The system also 
tests a few hundred prefixes and suffixes, if the raw word cannot be 
found.  This allows an interpretation of many words which would otherwise 
be marked unknown.  The result of this analysis is fairly straightforward 
in most cases, accurate but esoteric in some others.  Some constructions 
are recognized Latin words, and some are perfectly reasonable words which 
may never have been used by Cicero or Caesar but might have been used by 
Augustine or a monk of Jarrow.  For about 1 in 10 constructed words the 
result has no relation to the normal dictionary meaning.  

BE WARNED!  The program will go to great lengths if all tricks are 
invoked.  If you get a word formed with an enclitic, prefix, suffix, and 
syncope, be very suspicious!  It my well be right, but look carefully.  
(Try siquempiamque!) 

The final try is to to look at the input as two words run together.  In 
most cases this works out, and is especially useful for late Latin number 
usage.  However, this algorithm may go very wrong.  If it is not obviously
right, it is probably incorrect.  

With this facility, and a 30000 word dictionary, trials on some tested 
classical texts and the Vulgate Bible give hit rates of far better than 
99%, excluding proper names (there are very few proper names in this 
dictionary).  (I am an old soldier and seem to have in the dictionary 
every possible word for attack or destroy.  The system is near perfect for
Caesar.) The question arises, what hit rate can be expected for a general 
dictionary.  Classical Latin dictionaries have no references to the 
terminology of Christian theology.  The legal documents and deeds of the 
Middle Ages are a challenge of jargon and abbreviations.  These areas 
require special knowledge and vocabulary, but even there the ability to 
handle the non-specialized words is a large part of the effort.  

The development system allows the inclusion of specialized vocabulary (for
instance a SPEcial dictionary for specialized words not wanted in most 
dictionaries), and the opportunity for the user to add additional words to
a DICT.LOC.  

It was initially expected that there would be special dictionaries for 
special applications.  That is why there is the possibility of a SPECIAL 
dictionary.  Now the general dictionary is coded by AGE and application 
AREA.  Thus special words used initially/only by St Thomas Aquinas would 
be Medieval (AGE code F) and Ecclesiastical (AREA code E).  Eventually 
there needs to be a filter that allows one, upon setting parameters for 
Medieval and Ecclesiastical, to push those words over others.  Right now 
there are not have enough non-classical vocabulary to support such a 
scheme.  The problem is that one needs a really complete classical 
dictionary before one can assure that new entries are uniquely Medieval, 
that they are not just classical words that appear in a Medieval text.  
And the updated is only into the D's.  So the situation is that the 
mechanism is there, but not sufficient data.  Nevertheless that is exactly
the application I had in mind when I set out to do the program.  

One can set a parameter to exclude medieval words if there is a classical 
word answering the same parse.  Likewise, the program can ignore rare 
meanings if there is a common meaning for the parse.  

The program is probably much larger than is necessary for the present 
application.  It is still in development but some effort has now been put 
into optimization.  Nevertheless there is lots of room for speeding it up.


This is a free program, which means it is proper to copy it and pass it on
to your friends.  Consider it a developmental item for which there is no 
charge.  However, it is Copyrighted (c), so please don't sell it as your 
own without at least telling me.  Permission will be freely given.  

This version is distributed without obligation, but the developer would 
appreciate comments and suggestions.  


William A Whitaker 
PO Box 3036 
McLean VA 22103-3036 
USA 
whitaker@erols.com


                         OPERATIONAL DESCRIPTION                          
 


This write up is rudimentary and assumes that the user is experienced with
computers.  

The WORDS program, Version 1.97, with it's accompanying data files should 
run on PC in DOS/Windows 95/98/NT, any monitor.  Simply download the 
self-extracting EXE file and execute it in your chosen subdirectory to 
UNZIP the files into a subdirectory of a hard disk.  Then call WORDS.  

There are a number of files associated with the program.  These must be in
the subdirectory of the program, and the program must be run from that 
subdirectory.  

    * WORDS.EXE is the executable program.  

    * INFLECT.SEC holds the encoded inflection records.  

    * STEMFILE.GEN contains the stems of the GENERAL dictionary.  

    * MEANFILE.GEN contains the meanings of the GENERAL dictionary 
    entries.  

    * INDXFILE.GEN contains a set of indexes into the DICTFILE.  

    * There may also be a set of files for a SPECIAL (.SPE) dictionary
    of the same structure as the GENERAL dictionary, but there is no 
    SPECIAL dictionary in the present distribution.  

    * A LOCAL dictionary may also be used.  This is a limited 
    dictionary of a different form, human readable and writeable.  The
    knowledgeable user can augment and modify it on-line.  It would 
    consist of the file DICT.LOC.  

    * UNIQUES.LAT contains certain words which regular processing does
    not get.  

    * ADDONS.LAT contains the set of prefixes, suffixes and enclitics 
    (-que, -ve) and the like.  

    * Other files may be generated by the program, so run it in a 
    configuration that allows the creation of files.  

All these files are necessary to run the program (except the optional 
dictionaries SPE and LOC).  This excess of files is a consequence of 
the present developmental nature of the program.  The files are very 
simple, almost human-readable.  Presumably, a later version could 
condense and encode them.  Nevertheless, beyond the original COPY, the
user need not worry about them.  

Additionally, there are files that the program may produce on request.
All of these share the name WORD, with various extensions, and they 
are all ASCII text files which can be viewed and processed with an 
ordinary editor.  The casual user probably does not want to get 
involved with these.  WORD.OUT will record the whole output, WORD.UNK 
will list only words the program is unable to interpret.  These 
outputs are turned on through the PARAMETERS mechanism.  

PARAMETERS may be set while running the program by inputting a line 
containing a '#' mark as the only (or first) character.  
Alternatively, WORD.MOD contains the MODES that can be set by 
CHANGE_PARAMETERS.  If this file does not exist, default modes will be
used.  The file may be produced or changed when changing parameters.  
It can also be modified, if the user is sufficiently confident, with 
an editor, or deleted, thereby reverting to defaults.  

(There is another set of developers parameters which may be set in 
some versions with the input of '!'.  These MODES may be changed and 
saved in a file WORD.MDV.  These are not normal user facilities, 
probably no one but the developer would be interested.  In any 
specific release these facilities may, or may not, work.  They are 
just mentioned here in case they ever come up accidentally, and to 
point out that there are other capabilities, actual and possible, 
which may be invoked if there is a special need.) 

WORD.OUT is the file produced if the user requests, in 
CHANGE_PARAMETERS, output to a file.  This output can be used for 
later manipulation with a text editor, especially when the input was a
text file of some length.  If the parameter UNKNOWNS_ONLY is set, the 
output serves as a sort of a Latin spell checker.  Those words it 
cannot match may just not be in the dictionary, but alternatively they
may be typos.  A WORD.UNK file of unknowns can be generated.  




Program Operation 


To start the program, in the subdirectory that contains all the files,
type WORDS.  A setup procedure will execute, processing files.  Then 
the program will ask for a word to be keyed in.  Input the word and 
give a return (ENTER).  Information about the word will be displayed.  

One can input a whole line at a time, but only one line since the 
return at the end of line will start the processing.  If the results 
would fill more than a computer screen, the output is halted until the
user responds to the 'MORE' message with a return.  A file containing 
a text, a series of lines, can be input by keying in the character 
'@', followed (with no spaces) by the DOS name of the file of text.  
This input file need not be in the program subdirectory, just use the 
full DOS path and name of the file.  This is usually accompanied with 
the setting of the parameter switches to create and write to an output
file, WORD.OUT.  

One can have a comment in the file, a terminal portion of a line that 
is not parsed.  This could be an English meaning, a source where the 
word was found, an indication that it may have been miscopied, etc.  A
comment begins with a double dash [--] and continues to the end of the
line.  The '--' and everything after on that line is ignored by the 
program.  

A '#' character input will permit the user to set modes to prevent the
process from trying prefixes and suffixes to get a match on an item 
unknown to the dictionary, put output to a file, etc.  Going into the 
CHANGE_PARAMETERS, the '?' character calls help for each entry.  

Two successive returns with no text will terminate the program (except
in text being read from an @ disk file.) 

One can also call WORDS with the input on the command line

WORDS amo amas amat
which will cause it to execute for that input and then terminate.  
This is for a quick word or two.  

Another mode of operation is to provide an input and an output file.  

WORDS INFILE OUTFILE
with names of your choice (full path names if not operating all in the
same subdirectory).  The program will read as input the INFILE and 
write the output to the OUTFILE (as though it were WORD.OUT).  It will
then await further input from the user.  It terminates with a return.  
If the parameters are not legal file names, the program will assume 
they are Latin words to be processed as command line input.  

Examples

Following are annotated examples of output.  Examination of these will
give a good idea of the system.  The present version may not match 
these examples exactly - things are changing - but the principle is 
there.  A recent modification is the output of dictionary forms or 
'principal parts' (shown below for some examples).  

=>agricolarum
agricol.arum       N      1  1 GEN  P M P
agricola, agricolae   N M
farmer


This is a simple first declension noun, and a unique interpretation.  
The '1 1' means it is first declension, with variant 1. This is an 
internal coding of the program, and may not correspond exactly with 
the grammatical numbering.  The 'N' means it is a noun.  It is the 
form for genitive (GEN), plural (1st 'P').  The stem is masculine (M) 
and represents a person (2nd 'P').  The stem is given as 'agricol' and
the ending is 'arum'.  The stem is normal in this case, but is a 
product of the program, and may not always correspond to conventional 
usage.  

=>feminae
femin.ae           N      1  1 GEN  S F P
femin.ae           N      1  1 DAT  S F P
femin.ae           N      1  1 NOM  P F P
femin.ae           N      1  1 VOC  P F P
femina, feminae
woman


This word has several possible interpretations in case and number 
(Singular and Plural).  The gender is Feminine.  Presumably, the user 
can examine the adjoining words and reduce the set of possibilities.  
Maybe the program will take care of this in some future version.  

=>cornu
corn.u             N      4  2 NOM  S N T
corn.u             N      4  2 DAT  S N T
corn.u             N      4  2 ACC  S N T
corn.u             N      4  2 ABL  S N T
cornu, cornus
horn (of an animal); horn, trumpet; wing of an attacking army


Here is an example of another declension and a second variant.  The 
Masculine (-us) nouns of the declension (fructus) are '4 1' and the 
Neuter (-u) nouns are coded as '4 2'.  This word is neuter (2nd N) and
represents a thing (T).  

=>ego
ego                PRON   5  1 NOM  S C PERS     
I, me; myself


A pronoun is much like a noun.  The gender is common (C), that is, it 
may be masculine or feminine.  It is a personal (PERS) pronoun.  

=>illud
ill.ud             PRON    6  1 NOM S N ADJECT                
ill.ud             PRON    6  1 ACC S N ADJECT                
that; those (pl.); also DEMONST


Here we have an adjectival (ADJECT) and demonstrative (DEMONST) 
pronoun.  

=>hic
hic                ADV    POS                                 
here, in this place                                                      
h.ic               PRON    3  1 NOM S M ADJECT                
this; these (pl.); also DEMONST


In this case there is a adjectival/demonstrative pronoun, or it may be
an adverb.  The POS means that the comparison of the adverb is 
positive.  

=>bonum
bon.um             N      2  2 NOM  S N T
bon.um             N      2  2 ACC  S N T
good thing, profit, advantage; goods (pl.), possessions                  
bon.um             ADJ    1  1 NOM  S N POS   
bon.um             ADJ    1  1 ACC  S M POS   
bon.um             ADJ    1  1 ACC  S N POS   
bon.um             ADJ    1  1 VOC  S N POS   
good, honest, brave, noble; better; best


Here we have an adjective, but it might also be a noun.  The 
interpretation of the adjective says that it is POSitive, but note 
that there are meanings for COMParative and SUPERlative also on the 
line.  Check the comparison value before deciding.  

=>facile
facile             ADV    POS   
easily, readily                                                          
facil.e            ADJ    3  2 NOM  S N POS   
facil.e            ADJ    3  2 ACC  S N POS   
facil.e            ADJ    3  2 VOC  S N POS   
easy, easy to do, without difficulty, ready, quick, good natured, courteous


Here is an adjective or and adverb.  Although they are related in 
meaning, they are different words.  

=>acerrimus
acerrim.us         ADJ    3  2 NOM  S M SUPER 
sharp, bitter, pointed, piercing, shrill; sagacious, keen; severe, vigoro


Here we have an adjective in the SUPERlative.  The meanings are all 
POSitive and the user must add the -est by himself.  

=>optime
optim.e            ADJ    1  1 VOC  S M SUPER 
good, honest, brave, noble; better; best                                 
optime             ADV    SUPER 
well, very, quite, rightly, agreeably, cheaply, in good, style; better; best
Here is an adjective or and adverb, both are SUPERlative.  

=>monuissemus
monu.issemus       V       2  1 PLUP ACTIVE  SUB  1 P X       
remind, advise, warn; teach; admonish; foretell


Here is a verb for which the form is PLUPerfect, ACTIVE, SUBjunctive, 
1st person, Plural.  It is 2nd conjugation, variant 1. 

=>amat
am.at              V       1  1 PRES ACTIVE  IND  3 S X       
amo, amare, amavi, amatus
love, like; fall in love with; be fond of; have a tendency to


Another regular verb, PRESent, ACTIVE, INDicative.  

=>amatus
amat.us            VPAR    1  1 NOM S M PERF PASSIVE PPL X    
amo, amare, amavi, amatus
love, like; fall in love with; be fond of; have a tendency to


Here we have the PERFect, PASSIVE ParticiPLe, in the NOMinative, 
Singular, Masculine.  

=>amatu
amat.u             SUPINE  1  1 ABL S X                       
amo, amare, amavi, amatus
love, like; fall in love with; be fond of; have a tendency to


Here is the SUPINE of the verb in the ABLative Singular.  

=>orietur
ori.etur           V       3  4 FUT  PASSIVE IND  3 S DEP     
rise, arise; spring from, appear; be descended; begin, proceed, originate



For DEPondent verbs the passive form is to be translated as if it were
active voice.  

=>ab
ab                 PREP   ABL 
by, from, away from


Here is a PREPosition that takes an ABLative object.  

=>sine
sin.e              V       3  1 PRES ACTIVE  IMP  2 S X       
allow, permit                                                            
sine               PREP   ABL 
without


Here is a PREPosition that might also be a Verb.  

=>contra
contra             PREP   ACC 
against, opposite; facing; contrary to, in reply to                      
contra             ADV    POS   
in opposition, in turn; opposite, on the contrary


Here is a PREPosition that might also be an ADVerb.  This is a very 
common situation, with the meanings being much the same.  

=>et
et                 CONJ   
and, and even; also, even;  (et ... et = both ... and)


Here is a straight CONJunction.  

=>vae
vae                INTERJ 
alas, woe, ah; oh dear;  (Vae, puto deus fio.)


Here is a straight INTERJection.  

=>septem
septem             NUM     2  0 X   X X CARD       7          
seven


An additional provision is the attempt to recognize and display the 
value of Roman numerals, even combinations of appropriate letters that
do not parse conventionally to a value but may be ill-formed Roman 
numerals.  

=>VII
vii                NUM     2  0 X   X X CARD       7          
   7  as a ROMAN NUMERAL


Generally, the meaning is given for the base word, as is usual for 
dictionaries.  For the verb, it will be a present meaning, even when 
the tense given is perfect.  For an adjective, the positive meaning is
given, even if a comparative or superlative form is shown.  This is 
also so when a word is constructed with a suffix, thus an adverb 
constructed from its adjective will show the base adjective meaning 
and an indication of how to make the adverb in English.  The user must
make the proper interpretation.  

In some cases an adjective will be found that is a participle of a 
verb that is also found.  The participle meaning, as inferred by the 
user from the verb meaning, is not superseded by the explicit 
adjective entry, but supplemented by it with possible specialized 
meanings.  
 
 

Signs and Abbreviations in Meaning

, [comma] is used to separate meanings that are similar.  The 
philosophy has been to list a number of synonyms just to key the 
reader in making his translation.  There is no rigor in this.  

; [semicolon] is used to separate sets of meanings that differ in 
intent.  This is just a general tendency and is not rigorously 
enforced.  

/ [solidus] means 'or' or gives an alternative word.  It sometimes 
replaces the comma and is often used to compress the meaning into a 
short line.  

(...) [parentheses] set off and optional word or modifier, e.g., 
'(nearly) white' means 'white' or 'nearly white', (matter in) dispute 
means either the matter in dispute or the dispute itself.  They are 
also used to set off an explanation, further information about the 
word or meaning, or an example of a translation or a word combination.


?  [question mark] in a meaning implies a doubt about the 
interpretation, or even about the existence of the word at all.  For 
the purposes of this program, it does not matter much.  If the dubious
word does not exist, no one will ask for it.  If it appears in his 
text, the reader is warned that the interpretation may be questionable
to some degree, but is what is available.  May indicate somewhat more 
doubt than (perh.).  

~ [tilde] stands for the stem or word in question.  Usually it does 
not have an ending affixed, as is the convention in other 
dictionaries, but represents the word with whatever ending is proper.  
It is just a space saving shorthand or abbreviation.  

=> in meaning this indicates a translation example.  

abb.  abbreviation.  

(Dif) - [Diferrari] is used to indicate an additional meaning taken 
from A Latin-English Dictionary of St. Thomas Aquinas by Roy J. 
Diferrari.  This is singled out because of the importance of Aquinas.  
The reference is to be applied from the last semicolon before the 
mark.  It is likely that the meaning diverges from the base by being 
medieval and ecclesiastical, but not so overwhelming as to deserve a 
separate entry.  

(Douay) is used to designate those words for which the meaning has 
been derived or modified by examination of the Douay translation of 
the Latin Vulgate Bible of St Jerome.  

(eccl.) ecclesiastical - designating a special church meaning in a 
list of conventional meanings, an additional meaning not sufficient to
justify a separate entry with an ecclesiastical code.  

esp.  [especially] - indicates a significant association, but is only 
advisory.  

(King James) or (KJames) is used to designate those words for which 
the meaning has been derived or modified by examination of the King 
James Bible in connection with the Latin Vulgate Bible of St Jerome.  

(KLUDGE) This indicates that the particular form is distorted in order
to make it come out correctly.  This usually takes the form of a 
special conjugational form applied to a few words, not applicable to 
other words of the same conjugation or declension.  The user can 
expect the form and meaning to be correct, but the numerical coding 
will be odd.  

(L+S) [Lewis and Short] is used to indicate that the meaning starting 
from the previous semicolon is information from Lewis and Short 'A 
Latin Dictionary' that differs from, or significantly expands on, the 
meaning in the 'Oxford Latin Dictionary' (OLD) which is the baseline 
for this program.  This is not to imply that the meaning listed is 
otherwise taken directly from the OLD, just that it is not 
inconsistent with OLD, but the L+S information either inconsistent 
(likely OLD knows better) or Lewis and Short has included meanings 
appropriate for late Latin writers beyond the scope of OLD.  The 
program is just warning the reader that there may be some difference.  
There are cases in which this indication occurs in entries that have 
Lewis and Short as the source.  In those cases, the basic word is in 
OLD but the entry is a variant form or spelling not cited there.  
There are cases where OLD and L+S give somewhat different spellings 
and meanings for the 'same' word (same in the sense that both 
dictionaries point to the same citation).  In these cases a 
combination of meanings are given for both entries with the (L+S) code
distinction and the entries of different spelling or declension have 
the SOURCE coded.  

(OLD) [Oxford Latin Dictionary] is used to indicate an additional 
meaning taken from the Oxford Latin Dictionary in an entry that is 
otherwise attributed.  While it is usually true that if a classical 
word has other than OLD as the listed source then it does not appear 
in that form in OLD, this is not always the case.  On occasion some 
other dictionary gives a much better or more complete and 
understandable definition and the honor of source is thereto given.  

(PASS) [passive] - indicates a special, unexpected meaning for the 
passive form of the verb, not easily associated with the active 
meaning.  

perh.  [perhaps] - denotes an additional uncertainty, but not as 
strong as (?).  

(pl.) [plural] means that the Latin word is believed by scholars to be
used (almost) always in the plural form, with the meaning stated, even
though that meaning in English may be singular.  If it appears in the 
beginning of the meaning, before the first comma, it applies to all 
the meanings.  If it appears later, it applies only to that and later 
meanings.  For the purpose of this program, this is only advisory.  
While it is used by some tools to find the expected dictionary entry, 
the program does not exclude a singular form in the output.  While it 
may be true that in good, classical Latin it is never used in the 
singular, this does not mean that some text somewhere might not use 
the singular, nor that it is uncommon in later Latin.  

prob.  [probably] - denotes an some uncertainty, but not as much as 
(perh.).  

pure Latin ...  indicates a pure Latin term for a word which is 
derived from another language (almost certainly Greek).  

(rude) - indicates that this meaning was used in a rude, vulgar, 
coarse, or obscene manner, not what one should hear in polite company.
Such use is likely from graffiti or epigrams, or in plays in which the
dialogue is to indicate that the characters are low or crude.  
Meanings given by the program for these words are more polite, and the
user is invited to substitute the current street language or obscenity
of his choice to get the flavor of text.  

(sg.) [singular] means that the Latin word is believed by scholars to 
be used always in the singular.  If it appears in the beginning of the
meaning, before the first comma, it applies to all the meanings.  If 
it appears later, it applies only to that and later meanings.  For the
purpose of this program, this is only advisory.  

usu.  [usually] is weakly advisory.  (usu.  pl.) is even weaker than 
pl.  and may imply that the pl.  tendency occurred only during certain
periods.  

w/ means 'with'.  




                      BRIEF PROGRAM DESCRIPTION                       
 


A effect of the program is to derive the structure and meaning of 
individual Latin words.  A procedure was devised to: 

    * examine the ending of a word, 

    * compare it with the standard endings, 

    * derive the possible stems that could be consistent, 

    * compare those stems with a dictionary of stems, 

    * eliminate those for which the ending is inconsistent with the 
    dictionary stem (e.g., a verb ending with a noun dictionary item),
    

    * if unsuccessful, it tries with a large set of prefixes and 
    suffixes, and various tackons (e.g., -que), 

    * finally it tries various 'tricks' (e.g., 'ae' may be replaced by
    'e', 'inp' by 'imp', syncope, etc.), 

    * and it reports any resulting matches as possible 
    interpretations.  

With the input of a word, or several words in a line, the program 
returns information about the possible accedience, if it can find an 
agreeable stem in its dictionary.  

=>amo
am.o               V       1  1 PRES ACTIVE  IND  1 S X       
love, like; fall in love with; be fond of; have a tendency to


To support this method, an INFLECT.SEC data file was constructed 
containing possible Latin endings encoded by a structure that 
identifies the part of speech, declension, conjugation, gender, 
person, number, etc.  This is a pure computer encoding for a 'brute 
force' search.  No sophisticated knowledge of Latin is used at this 
point.  Rules of thumb (e.g., the fact, always noted early in any 
Latin course, that a neuter noun has the same ending in the nominative
and accusative, with a final -a in the plural) are not used in the 
search.  However, it is convenient to combine several identical 
endings with a general encoding (e.g., the endings of the perfect 
tenses are the same for all verbs, and are so encoded, not replicated 
for every conjugation and variant).  

Many of the distinguishing differences identifying conjugations come 
from the voiced length of stem vowels (e.g., between the present, 
imperfect and future tenses of a third conjugation I-stem verb and a 
fourth conjugation verb).  These aural differences, the features that 
make Latin 'sound right' to one who speaks it, are lost entirely in 
the analysis of written endings.  

The endings for the verb conjugations are the result of trying to 
minimize the number of individual endings records, while yet keeping 
the structure of the inflections data file fairly readable.  There is 
no claim that the resulting arrangement is consonant with any 
grammarian's view of Latin, nor should it be examined from that 
viewpoint.  While it started from the conjugations in text books, it 
can only be viewed as some fuzzy intermediate step along a path to a 
mathematically minimal number of encoded verb endings.  Later versions
of the program might improve the system.  

There are some egregious liberties taken in the encoding.  With the 
inclusion of two present stems, the third conjugation I-stem verbs may
share the endings of the regular third conjugation.  The fourth 
conjugation has disappeared altogether, and is represented internally 
as a somewhat modified variant of the third conjugation (3, 4), but 
this is replaced in the user output by 4 1. There is an artificial 
fifth conjugation for esse and others, a sixth for eo, and a seventh 
for other irregularities.  

As an example, a verb ending record has the structure: 

    * PART the part code for a verb = V 

    * CONjugation consisting of two parts: 

    * WHICH a conjugation identifier - range 0..9 

    * VAR a variant identifier, on WHICH - range 0..9 

    * TENSE an enumeration type - range PRES..FUTP + X 

    * VOICE an enumeration type - range ACTIVE..PASSIVE + X 

    * MOOD an enumeration type - range IND..PPL + X 

    * PERSON person, first to third - range 1..3 + 0 

    * NUMBER an enumeration type - range S..P + X 

    * KIND enumeration type of verb - range TO_BE..PERFDEF + X 

    * KEY which stem to be used - range 1..4 

    * SIZE number of characters - range 0..9 

    * ENDING the ending as a string of SIZE characters 

Thus, the entry for the ending appropriate to 'amo' is: 

V 1 1 PRES IND ACTIVE 1 S X 1 o


KIND is not often used with the verb endings, but is part of the 
record for convenience elsewhere.  For verbs, the KIND has not yet 
been exploited significantly, except for DEP and IMPERS.  

The rest of the elements are straightforward and generally use the 
abbreviations that are common in any Latin text.  An X or 0 represents
the 'don't know' or 'don't care' for enumeration or numeric types.  
Details are documented below in the CODES section.  

A verb dictionary record has the structure: 

    * STEMS for a verb there are 4 stems 

    * PART the part code for a verb = V 

    * WHICH a conjugation identifier - range 0..9 

    * VAR a variant identifier - range 0..9 

    * KIND enumeration type of verb - range TO_BE..PERFDEF + X 

    * MEANING text for English translations (up to 80 characters) 

Thus, an entry corresponding to 'amo amare amavi amatus' is: 

am am amav amat 
V 1 1 X            X X X X X 
like, love


(The dangling X X X X X are used to encode information about the time 
in which this word is found and the subject area.  There is not yet 
enough details in the dictionary to allow much exploitation of this 
information.) 

Endings may not uniquely determine which stem, and therefore the right
meaning.  'portas' could be the ablative plural of 'gate', or the 
second person, singular, present indicative active of 'carry'.  In 
both cases the stem is 'port'.  All possibilities are reported.  

portas 
port.as V 1 1 PRES IND ACTIVE 2 S X 
carry, bring 

port.as N 1 1 ACC P F T 
gate, entrance; city gates; door; avenue;


And note that the same stem (port) has other uses, for 'portus', 
'harbor'.  

portum 
port.um N 4 1 ACC S M T 
port, harbor; refuge, haven, place of refuge


PLEASE NOTE: It is certainly possible for the program to find a valid 
Latin construction that fits the input word and to have that 
interpretation be entirely wrong in the context.  It is even possible 
to interpret a number, in Roman numerals, as a word!  (But the number 
would be reported also.) 

For the case of defective verbs, the process does not necessarily have
to be precise.  Since the purpose is only to translate from Latin, 
even if there are unused forms included in the algorithm, these will 
not come up in any real Latin text.  The endings for the verb 
conjugations are the result of trying to minimize the number of 
individual endings records, while keeping the structure of the base 
INFLECTIONS data file fairly readable.  

In general the program will try to construct a match with the 
inflections and the dictionaries.  There are a number of specific 
checks to reject certain mathematically correct combinations that do 
not appear in the language, but these check are relatively few.  The 
philosophy has been to allow a generous interpretation.  A remark in a
text or dictionary that a particular form does not exist must be 
tempered with the realization that the author probably means that it 
has not been observed in the surviving classical literature.  This 
body of reference is minuscule compared to the total use of Latin, 
even limited to the classical period.  Who is to say that further 
examples would not turn up such an example, even if it might not have 
been approved of by Cicero.  It is also possible the such reasonable, 
if 'improper', constructs might occur in later writings by less 
educated, or just different, authors.  Certainly English shows this 
sort of variation over time.  

If the exact stem is not found in the dictionary, there are rules for 
the construction of words which any student would try.  The simplest 
situation is a known stem to which a prefix or suffix has been 
attached.  The method used by the program (if DO_FIXES is on) is to 
try any fixes that fit, to see if their removal results in an 
identifiable remainder.  Then the meaning is mechanically constructed 
from the meaning of the fix and the stem.  The user may need to 
interpret with a more conventional English usage.  This technique 
improves the performance significantly.  However, in about 40% of the 
instances in which there is a hit, the derivation is correct but the 
interpretation takes some imagination.  In something less than 10% of 
the cases, the inferred fix is just wrong, so the user must take some 
care to see if the interpretation makes any sense.  

This method is complicated by the tendency for prefixes to be modified
upon attachment (ab+fero => aufero, sub+fero => suffero).  The 
program's 'tricks' take many such instances into account.  Ideally, 
one should look inside the stem for identifiable fragments.  One would
like to start with the smallest possible stem, and that is most 
frequently the correct one.  While it is mathematically possible that 
the stem of 'actorum' is 'actor' with the common inflection 'um', no 
intuitive first semester Latin student would fail to opt for the 
genitive plural 'orum', and probably be right.  To first order, the 
procedure ignores such hints and reports this word in both forms, as 
well as a verb participle.  However, it can use certain generally 
applicable rules, like the superlative characteristic 'issim', to 
further guess.  

In addition, there is the capability to examine the word for such 
common techniques as syncope, the omission of the 've' or 'vi' in 
certain verb perfect forms (audivissem => audissem).  

If the dictionary can not identify a matching stem, it may be possible
to derive a stem from 'nearby' stems (an adverb from an adjective is 
the most common example) and infer a meaning.  If all else fails, a 
portion of the possible dictionary stem can be listed, from which the 
user can draw in making a guess.  


Trimming of uncommon results

Trimming now means something.  If TRIM_OUTPUT parameter is set, and 
specific parameters set in the MDEV, the program will deprecate those 
possible forms which come from archaic or medieval (non-classical) 
stems or inflections, also stems or inflections which are relatively 
uncommon.  It will report such if no classical/common solutions are 
found.  The default is set for this, expecting that most users are 
students and unlikely to encounter rare forms.  Other users can set 
the parameters appropriately for their situation.  

This capability is preliminary.  It is just becoming useful in that 
the factors are set for about half the dictionary entries.  There are 
still a large number of entries and inflections that are not set and 
will continue to be reported until determination of rarity is made.  


Special Cases

Some adjectives have no conventional positive forms (either missing or
undeclined), or the POS forms have more than one COMP/SUPER.  In these
few cases, the individual COMP or SUPER form is entered separately.  
Since it is not directly connected with a POS form, and only the POS 
forms have different numbered declensions, the special form is given a
declension of (0, 0).  An additional consequence is that the 
dictionary form in output is only for the COMP/SUPER, and does not 
reflect all comparisons.  

Uniques

There are some irregular situations which are not convenient to handle
through the general algorithms.  For these a UNIQUES file and 
procedure was established.  The number of these special cases is less 
than one hundred, but may increase as new situations arise, and 
decrease as algorithms provide better coverage.  The user will not see
much difference, except in that no dictionary forms are available for 
these unique words.  

Tricks

There are a number of situations in Latin writing where certain 
modifications or conventions regularly are found.  While often found, 
these are not the normal classical forms.  If a conventional match is 
not found, the program may be instructed to TRY_TRICKS.  Below is a 
partial list of current tricks.  

    * The syncopated form of the perfect often drops the 'v' and loses
    the vowel.  

    * An initial 'a' followed by a double letter often is used for an 
    'ad' prefix, likewise an initial 'ad' prefix is often replaced by 
    an 'a' followed by a double letter.  

    * An initial 'i' followed by a double letter often is used for an 
    'in' prefix, likewise an initial 'in' prefix is often replaced by 
    an 'i' followed by a double letter.  

    * A leading 'inp' could be an 'imp'.  

    * A leading 'obt' could be an 'opt'.  

    * An initial 'har...' or 'hal...' may be rendered by an 'ar' or 
    'al', likewise the dictionary entry may have 'ar'/'al' and the 
    trial word begin with 'ha...'.  

    * An initial 'c' could be a 'k', or the dictionary entry uses 'c' 
    for 'k'.  

    * A nonterminal 'ae' is often rendered by an 'e'.  

    * An initial 'E' can replace an 'Ae'.  

    * An 'iis...' beginning some forms of 'eo' may be contracted to 
    'is...'.  

    * A nonterminal 'ii' is often replaced by just 'i'; including 
    'ji', since in this program and dictionary all 'j' are made 'i'.  

    * A 'cl' could be a 'cul'.  

    * A 'vul' could be a 'vol'.  

    * and many others, including a procedure to try to break the input
    word into two.  

Various manipulations of 'u' and 'v' are possible: 'v' could be 
replaced by 'u', like the new Oxford Latin Dictionary, leading 'U' 
could be replaced by 'V', checking capitalization, all 'U's could have
been replaced by 'V', like stone cutting.  Previous versions had 
various kludges attempting to calculate the correct interpretation.  
They were surprisingly good, but philosophically baseless and 
certainly failed in a number of cases.  The present version simply 
considers 'u' and 'v' as the same letter in parsing the word.  
However, the dictionary entries make the distinction and this is 
reflected in the output.  

Various combinations of these tricks are attempted, and each try that 
results in a possible hit is run against the full dictionary, which 
can make these efforts time consuming.  That is a good reason to make 
the dictionary as large as possible, rather than counting on a smaller
number of roots and doing the maximum word formation.  

Finally, while the program could succeed on a word that requires two 
or three of these tricks to work in combination, there are limits.  
Some words for which all the modifications are supported will fail, if
there are just too many.  In fact, it is probably better that that be 
the case, otherwise one will generate too many false positives.  
Testing so far does not seem to show excessive zeal on the part of the
program, but the user should examine the results, especially when 
several tricks are involved.  

There is a basic conflict here.  At the state of the 1.97 dictionary 
there are so few words that both fail the main program and are caught 
by tricks that this option has been defaulted to No.  However, one 
could argue that there will be very few occasions for trying TRICKS, 
so that the cost is minimal.  Unfortunately the degree of completeness
of the dictionary for classical latin does not carry over to medieval 
Latin.  With the hope that the program will become more useful in that
area, the default has been changed back to Yes, reflecting the 
philosophy early in the development for classical Latin.  



Codes in Inflection Line

For completeness, the enumeration codes used in the output are listed 
here as Ada statements.  Simple numbers are used for person, 
declension, conjugations, and their variants.  Not all the facilities 
implied by these values are developed or used in the program or the 
dictionary.  This list is only for Version 1.97.  Other versions may 
be somewhat different.  This may make their dictionaries incompatible 
with the present program.  

NOTE: in print dictionaries certain information is conveyed by font 
encoding, e.g., the use of bold face or italics.  There is no system 
independent method of displaying such on computers (although 
individual programs can handle these, each in it own unique way).  
WORDS uses capital letters to express such differences, which method 
is system independent in present usage.  


  type PART_OF_SPEECH_TYPE is (
          X,         --  all, none, or unknown
          N,         --  Noun
          PRON,      --  PRONoun
          PACK,      --  PACKON -- artificial for code
          ADJ,       --  ADJective
          NUM,       --  NUMeral
          ADV,       --  ADVerb
          V,         --  Verb
          VPAR,      --  Verb PARticiple
          SUPINE,    --  SUPINE
          PREP,      --  PREPosition
          CONJ,      --  CONJunction
          INTERJ,    --  INTERJection
          TACKON,    --  TACKON -- artificial for code
          PREFIX,    --  PREFIX --  here artificial for code
          SUFFIX     --  SUFFIX --  here artificial for code
                                );                                   

  type GENDER_TYPE is (
          X,         --  all, none, or unknown
          M,         --  Masculine
          F,         --  Feminine
          N,         --  Neuter
          C          --  Common (masculine and/or feminine)
                       );

  type CASE_TYPE is (
          X,         --  all, none, or unknown
          NOM,       --  NOMinative
          VOC,       --  VOCative
          GEN,       --  GENitive
          LOC,       --  LOCative
          DAT,       --  DATive
          ABL,       --  ABLative
          ACC        --  ACCusitive
                     );
  
  type NUMBER_TYPE is (
          X,         --  all, none, or unknown
          S,         --  Singular
          P          --  Plural
                       );

  type COMPARISON_TYPE is (
          X,         --  all, none, or unknown
          POS,       --  POSitive
          COMP,      --  COMParative
          SUPER      --  SUPERlative
                           );   

  type TENSE_TYPE is (
          X,         --  all, none, or unknown
          PRES,      --  PRESent
          IMPF,      --  IMPerFect
          FUT,       --  FUTure
          PERF,      --  PERFect
          PLUP,      --  PLUPerfect
          FUTP       --  FUTure Perfect
                      );                        
  
  type VOICE_TYPE is (
          X,         --  all, none, or unknown
          ACTIVE,    --  ACTIVE
          PASSIVE    --  PASSIVE
                      );       
  
  type MOOD_TYPE is (
          X,         --  all, none, or unknown
          IND,       --  INDicative
          SUB,       --  SUBjunctive
          IMP,       --  IMPerative
          INF,       --  INFinative
          PPL        --  ParticiPLe
                     );                    

  type NOUN_KIND_TYPE is (
          X,            --  unknown, nondescript
          S,            --  Singular 'only'
          M,            --  plural or Multiple 'only'
          A,            --  Abstract idea
          N,            --  proper Name
          L,            --  Locale, name of country/city
          P,            --  a Person
          T,            --  a Thing
          W             --  a place Where
                           ); 

  type PRONOUN_KIND_TYPE is (
          X,            --  unknown, nondescript
          PERS,         --  PERSonal
          REL,          --  RELative
          REFLEX,       --  REFLEXive
          DEMONS,       --  DEMONStrative
          INTERR,       --  INTERRogative
          INDEF,        --  INDEFinite
          ADJECT        --  ADJECTival
                             ); 

  type VERB_KIND_TYPE is (
          X,         --  all, none, or unknown
          TO_BE,     --  only the verb TO BE (esse)
          TO_BEING,  --  compounds of the verb to be (esse)
          GEN,       --  verb taking the GENitive
          DAT,       --  verb taking the DATive  
          ABL,       --  verb taking the ABLative
          TRANS,     --  TRANSitive verb
          INTRANS,   --  INTRANSitive verb
          IMPERS,    --  IMPERSonal verb (implied subject 'it', 'they', 'God')
                     --  agent implied in action, subject in predicate
          DEP,       --  DEPonent verb
                     --  only passive form but with active meaning 
          SEMIDEP,   --  SEMIDEPonent verb (forms perfect as deponent) 
                     --  (perfect passive has active force)
          PERFDEF    --  PERFect DEFinite verb  
                     --  having only perfect stem, but with present force
                          );             

 type NUMERAL_KIND_TYPE is (
         X,          --  all, none, or unknown
         CARD,       --  CARDinal
         ORD,        --  ORDinal
         DIST,       --  DISTributive
         ADVERB      --  numeral ADVERB
                            );


The KIND_TYPEs represent various aspects of a word which may be useful
to some program, not necessarily the present one.  They were put in 
for various reasons, and later versions may change the selection and 
use.  Some of the KIND flags are never used.  In some cases more than 
one KIND flag might be appropriate, but only one is selected.  Some 
seemed to be a good idea at one time, but have not since proved out.  
The lists above are just for completeness.  

NOUN KIND is used in trimming (when set) the output and removing 
possibly spurious cases (locative for a person, but preserving the 
vocative).  

VERB KIND allows examples (when set) to give a more reasonable 
meaning.  A DEP flag allows the example to reflect active meaning for 
passive form.  It also allows the dictionary form to be constructed 
properly from stems.  TRANS/INTRANS were included to allow a further 
program a hint as to what kind of object it should expect.  This flag 
is only now being fixed during the update.  There are some verbs 
which, although mostly used in one way, might be either.  These are 
assigned X rather than breaking into two entries.  This would be of no
particular use at this point since it would not allow the object to be
determined.  GEN/DAT/ABL flags have related function, but are almost 
absent.  TO_BE is used to indicate that a form of esse may be part of 
a compound verb tense with a participle.  TO_BEING indicates a verb 
related to esse (e.g., abesse) which has no object, neither is in used
to form compounds.  IMPERS is used to weed out person and forms 
inappropriate to an impersonal verb, and to insert a special meaning 
distinct from a general form associated with the same verb stem.  

NUMERAL KIND is used by the program in constructing the meaning line.  

Help for Parameters


One can CHANGE_PARAMETERS by inputting a '#' [number sign] character 
(ANSI 35) as the input word, followed by a return.  (Note that this 
has changed from early versions in which '?' was used.) Each parameter
is listed and the user is offered the opportunity to change it from 
the current value by answering Y or N (any case).  For each parameter 
there is some explanation or help.  This is displayed by in putting a 
'?' [question mark], followed by a return.  HINT: While going down the
list if one has made all the changes desired, one need not continue to
the end.  Just enter a space and then give a return.  The program will
interpret this as an illegal entry (not Y or N) and will cancel the 
rest of the list, while retaining any changes made to that point.  

The various help displays are listed here: 


TRIM_OUTPUT_HELP
    This option instructs the program to remove from the output list of   
    possible constructs those which are least likely.  There is now a fair
    amount of trimming, killing LOC and VOC plus removing Uncommon and    
    non-classical (Archaic/Medieval) when more common results are found   
    and this action is requested (turn it off in MDV (!) parameters).     
    When a TRIM has been done, the output is followed by an asterix (*).  
    There certainly is no absolute assurence that the items removed are   
    not correct, just that they are statistically less likely.            
                          Since little is now done, the default is Y(es)  


HAVE_OUTPUT_FILE_HELP
    This option instructs the program to create a file which can hold the 
    output for later study, otherwise the results are just displayed on   
    the screen.  The output file is named WORD.OUT.
    This means that one run will necessarily overwrite a previous run,    
    unless the previous results are renamed or copied to a file of another
    name.  This is available if the METHOD is INTERACTIVE, no parameters. 
    The default is N(o), since this prevents the program from overwriting 
    previous work unintentionally.  Y(es) creates the output file.        

WRITE_OUTPUT_TO_FILE_HELP
    This option instructs the program, when HAVE_OUTPUT_FILE is on, to    
    write results to the file WORD.OUT.
    This option may be turned on and off during running of the program,   
    thereby capturing only certain desired results.  If the option        
    HAVE_OUTPUT_FILE is off, the user will not be given a chance to turn  
    this one on.  Only for INTERACTIVE running.         Default is N(o).  

DO_UNKNOWNS_ONLY_HELP
    This option instructs the program to only output those words that it  
    cannot resolve.  Of course, it has to do processing on all words, but 
    those that are found (with prefix/suffix, if that option in on) will  
    be ignored.  The purpose of this option is t allow a quick look to    
    determine if the dictionary and process is going to do an acceptable  
    job on the current text.  It also allows the user to assemble a list  
    of unknown words to look up manually, and perhaps augment the system  
    dictionary.  For those purposes, the system is usually run with the   
    MINIMIZE_OUTPUT option, just producing a list.  Another use is to run 
    without MINIMIZE to an output file.  This gives a list of the input   
    text with the unknown words, by line.  This functions as a spelling   
    checker for Latin texts.  The default is N(o).                        

WRITE_UNKNOWNS_TO_FILE_HELP
    This option instructs the program to write all unresolved words to a  
    UNKNOWNS file named WORD.UNK.
    With this option on , the file of unknowns is written, even though    
    the main output contains both known and unknown (unresolved) words.   
    One may wish to save the unknowns for later analysis, testing, or to  
    form the basis for dictionary additions.  When this option is turned  
    on, the UNKNOWNS file is written, destroying any file from a previous 
    run.  However, the write may be turned on and off during a single run 
    without destroying the information written in that run.               
    This option is for specialized use, so its default is N(o).           


IGNORE_UNKNOWN_NAMES_HELP
    This option instructs the program to assume that any capitalized word 
    longer than three letters is a proper name.  As no dictionary can be  
    expected to account for many proper names, many such occur that would 
    be called UNKNOWN.  This contaminates the output in most cases, and   
    it is often convenient to ignore these sperious UNKNOWN hits.  This   
    option implements that mode, and calls such words proper names.       
    Any proper names that are in the dictionary are handled in the normal 
    manner.                                The default is Y(es).          

IGNORE_UNKNOWN_CAPS_HELP
    This option instructs the program to assume that any all caps word    
    is a proper name or similar designation.  This convention is often    
    used to designate speakers in a discussion or play.  No dictionary can
    claim to be exaustive on proper names, so many such occur that would  
    be called UNKNOWN.  This contaminates the output in most cases, and   
    it is often convenient to ignore these sperious UNKNOWN hits.  This   
    option implements that mode, and calls such words names.  Any similar 
    designations that are in the dictionary are handled in the normal     
    manner, as are normal words in all caps.    The default is Y(es).     

DO_COMPOUNDS_HELP
    This option instructs the program to look ahead for the verb TO_BE (or
    iri) when it finds a verb participle, with the expectation of finding 
    a compound perfect tense or periphastic.  This option can also be a   
    trimming of the output, in that VPAR that do not fit (not NOM) will be
    excluded, possible interpretations are lost.  Default choice is Y(es).
    This processing is turned off with the choice of N(o).                

DO_FIXES_HELP
    This option instructs the program, when it is unable to find a proper 
    match in the dictionary, to attach various prefixes and suffixes and  
    try again.  This effort is successful in about a quarter of the cases 
    which would otherwise give UNKNOWN results, or so it seems in limited 
    tests.  For those cases in which a result is produced, about half give
    easily interpreted output; many of the rest are etymologically true,  
    but not necessarily obvious; about a tenth give entirely spurious     
    derivations.  The user must proceed with caution.                     
    The default choice is Y(es), since the results are generally useful.  
    This processing can be turned off with the choice of N(o).            

DO_TRICKS_HELP
    This option instructs the program, when it is unable to find a proper 
    match in the dictionary, and after various prefixes and suffixes, to  
    try every dirty Latin trick it can think of, mainly common letter     
    replacements like cl -> cul, vul -> vol, ads -> ass, inp -> imp, etc. 
    Together these tricks are useful, but may give false positives (>10%).
    They provide for recognized varients in classical spelling.  Most of  
    the texts with which this program will be used have been well edited  
    and standardized in spelling.  Now, moreover,  the dictionary is being
    populated to such a state that the hit rate on tricks has fallen to a 
    low level.  It is very seldom productive, and it is always expensive. 
    The only excuse for keeping it as default is that now the dictionary  
    is quite extensive and misses are rare.         Default is now Y(es).  ) ;

DO_DICTIONARY_FORMS_HELP
    This option instructs the program to output a line with the forms     
    normally associated with a dictionary entry (NOM and GEN of a noun,   
    the four principal parts of a verb, M-F-N NOM of an adjective, ...).  
    This occurs when there is other output (i.e., not with UNKNOWNS_ONLY).
    The default choice is N(o), but it can be turned on with a Y(es).     

SHOW_AGE_HELP
    This option causes a flag, like '<Late>' to appear for inflection or  
    form in the output.  The AGE indicates when this word/inflection was  
    in use, at least from indications is dictionary citations.  It is     
    just an indication, not controlling, useful when there are choices.   
    No indication means that it is common throughout all periods.         
    The default choice is Y(es), but it can be turned off with a N(o).    

SHOW_FREQUENCY_HELP
    This option causes a flag, like '<rare>' to appear for inflection or  
    form in the output.  The FREQ is indicates the relative usage of the  
    word or inflection, from indications is dictionary citations.  It is  
    just an indication, not controlling, useful when there are choices.   
    No indication means that it is common throughout all periods.         
    The default choice is Y(es), but it can be turned off with a N(o).    

DO_EXAMPLES_HELP
    This option instructs the program to provide examples of usage of the 
    cases/tenses/etc. that were constructed.  The default choice is N(o). 
    This produces lengthly output and is turned on with the choice Y(es). 

DO_ONLY_MEANINGS_HELP
    This option instructs the program to only output the MEANING for a    
    word, and omit the inflection details.  This is primarily used in     
    analyzing new dictionary material, comparing with the existing.       
    However it may be of use for the translator who knows most all of     
    the words and just needs a little reminder for a few.                 
    The default choice is N(o), but it can be turned on with a Y(es).     

DO_STEMS_FOR_UNKNOWN_HELP
    This option instructs the program, when it is unable to find a proper 
    match in the dictionary, and after various prefixes and suffixes, to  
    list the dictionary entries around the unknown.  This will likely     
    catch a substantive for which only the ADJ stem appears in dictionary,
    an ADJ for which there is only a N stem, etc.  This option should     
    probably only be used with individual UNKNOWN words, and off-line     
    from full translations, therefore the default choice is N(o).         
    This processing can be turned on with the choice of Y(es).            


SAVE_PARAMETERS_HELP
    This option instructs the program, to save the current parameters, as 
    just established by the user, in a file WORD.MOD.  If such a file     
    exists, the program will load those parameters at the start.  If no   
    such file can be found in the current subdirectory, the program will  
    start with a default set of parameters.  Since this parameter file is 
    human-readable ASCII, it may also be created with a text editor.  If  
    the file found has been improperly created, is in the wrong format, or
    otherwise uninterpretable by the program, it will be ignored and the  
    default parameters used, until a proper parameter file in written by  
    the program.  Since one may want to make temporary changes during a   
    run, but revert to the usual set, the default is N(o).                

 

There is also a set of DEVELOPER_PARAMETERS that are unlikely to be of
interest to the normal user.  Some of these facilities may be 
disconnected or not work for other reasons.  They are mostly for the 
use in the development process.  These may be changed or examined by 
in similar change procedure by inputting a '!' [exclamation sign] 
character, followed by a return.  

                     
HAVE_STATISTICS_FILE_HELP
    This option instructs the program to create a file which can hold     
    certain statistical information about the process.  The file is       
    overwritten for new invocation of the program, so old data must be    
    explicitly saved if it is to be retained.  The statistics are in TEXT 
    format.     The statistics file is named WORD.STA.
    This information is only of development use, so the default is N(o).  

WRITE_STATISTICS_FILE_HELP
    This option instructs the program, with HAVE_STATISTICS_FILE, to put  
    derived statistics in a file named WORD.STA. 
    This option may be turned on and off while running of the program,    
    thereby capturing only certain desired results.  The file is reset at 
    each invocation of the program, if the HAVE_STATISTICS_FILE is set.   
    If the option HAVE_STATISTICS_FILE is off, the user will not be given 
    a chance to turn this one on.                Default is N(o).         


SHOW_DICTIONARY_HELP
    This option causes a flag, like 'GEN>' to be put before the meaning   
    in the output.  While this is useful for certain development purposes,
    it forces off a few characters from the meaning, and is really of no  
    interest to most users.                                               
    The default choice is N(o), but it can be turned on with a Y(es).     

SHOW_DICTIONARY_LINE_HELP
    This option causes the number of the dictionary line for the current  
    meaning to be output.  This is of use to no one but the dictionary    
    maintainer.  The default choice is N(o).  It is activated by Y(es). 

SHOW_DICTIONARY_CODES_HELP
    This option causes the codes for the dictionary entry for the current 
    meaning to be output.  This may not be useful to any but the most     
    involved user.  The default choice is N(o).  It is activated by Y(es).

DO_PEARSE_CODES_HELP
    This option causes special codes to be output flagging the different  
    kinds of output lines.  01 for forms, 02 for dictionary forms, and    
    03 for meaning. The default choice is N(o).  It is activated by Y(es).

DO_ONLY_INITIAL_WORD_HELP
    This option instructs the program to only analyze the initial word on 
    each line submitted.  This is a tool for checking and integrating new 
    dictionary input, and will be of no interest to the general user.     
    The default choice is N(o), but it can be turned on with a Y(es).     

FOR_WORD_LIST_CHECK_HELP
    This option works in conjunction with DO_ONLY_INITIAL_WORD to allow   
    the processing of scanned dictionarys or text word lists.  It accepts 
    only the forms common in dictionary entries, like NOM S for N or ADJ, 
    or PRES ACTIVE IND 1 S for V.  It is be used only with DO_INITIAL_WORD
    The default choice is N(o), but it can be turned on with a Y(es).     

UPDATE_LOCAL_DICTIONARY_HELP
    This option instructs the program to invite the user to input a new   
    word to the local dictionary on the fly.  This is only active if the  
    program is not using an (@) input file!  If an UNKNOWN is discovered, 
    the program asks for STEM, PART, and MEAN, the basic elements of a    
    dictionary entry.  These are put into the local dictionary right then,
    and are available for the rest of the session, and all later sessions.
    The use of this option requires a detailed knowledge of the structure 
    of dictionary entries, and is not for the average user.  If the entry 
    is not valid, reloading the dictionary will raise and exception, and  
    the invalid entry will be rejected, but the program will continue     
    without that word.  Any invalid entries can be corrected or deleted   
    off-line with a text editor on the local dictionary file.  If one does
    not want to enter a word when this option is on, a simple RETURN at   
    the STEM=> prompt will ignore and continue the program.  This option  
    is only for very experienced users and should normally be off.        
                                              The default is N(o).        
          ------    NOT AVAILABLE IN THIS VERSION   -------               

UPDATE_MEANINGS_HELP
    This option instructs the program to invite the user to modify the    
    meaning displayed on a word translation.  This is only active if the  
    program is not using an (@) input file!  These changes are put into   
    the dictionary right then and permenently, and are available from     
    then on, in this session, and all later sessions.   Unfortunately,    
    these changes will not survive the replacement of the dictionary by a 
    new version from the developer.  Changes can only be recovered by     
    considerable prcessing by the deneloper, and should be left there.    
    This option is only for experienced users and should remain off.      
                                              The default is N(o).        
          ------    NOT AVAILABLE IN THIS VERSION   -------               

DO_ONLY_FIXES_HELP
    This option instructs the program to ignore the normal dictionary     
    search and to go direct to attach various prefixes and suffixes before
    processing. This is a pure research tool.  It allows one to examine   
    the coverage of pure stems and dictionary primary compositions.       
    This option is only available if DO_FIXES is turned on.               
    This is entirely a development and research tool, not to be used in   
    conventional translation situations, so the default choice is N(o).   
    This processing can be turned on with the choice of Y(es).            


DO_FIXES_ANYWAY_HELP
    This option instructs the program to do both the normal dictionary    
    search and then process for the various prefixes and suffixes too.    
    This is a pure research tool allowing one to consider the possibility 
    of strange constructions, even in the presence of conventional        
    results, e.g., alte => deeply (ADV), but al+t+e => wing+ed (ADJ VOC)  
    (If multiple suffixes were supported this could also be wing+ed+ly.)  
    This option is only available if DO_FIXES is turned on.               
    This is entirely a development and research tool, not to be used in   
    conventional translation situations, so the default choice is N(o).   
    This processing can be turned on with the choice of Y(es).            
          ------    PRESENTLY NOT IMPLEMENTED    ------                   

USE_PREFIXES_HELP
    This option instructs the program to implement prefixes from ADDONS   
    whenever and wherever FIXES are called for.  The purpose of this      
    option is to allow some flexibility while the program in running to   
    select various combinations of fixes, to turn them on and off,        
    individually as well as collectively.  This is an option usually      
    employed by the developer while experimenting with the ADDONS file.   
    This option is only effective in connection with DO_FIXES.            
    This is primarily a development tool, so the conventional user should 
    probably maintain the default  choice of Y(es).                       

USE_SUFFIXES_HELP
    This option instructs the program to implement suffixes from ADDONS   
    whenever and wherever FIXES are called for.  The purpose of this      
    option is to allow some flexibility while the program in running to   
    select various combinations of fixes, to turn them on and off,        
    individually as well as collectively.  This is an option usually      
    employed by the developer while experimenting with the ADDONS file.   
    This option is only effective in connection with DO_FIXES.            
    This is primarily a development tool, so the conventional user should 
    probably maintain the default  choice of Y(es).                       

USE_TACKONS_HELP
    This option instructs the program to implement TACKONS from ADDONS    
    whenever and wherever FIXES are called for.  The purpose of this      
    option is to allow some flexibility while the program in running to   
    select various combinations of fixes, to turn them on and off,        
    individually as well as collectively.  This is an option usually      
    employed by the developer while experimenting with the ADDONS file.   
    This option is only effective in connection with DO_FIXES.            
    This is primarily a development tool, so the conventional user should 
    probably maintain the default  choice of Y(es).                       


DO_MEDIEVAL_TRICKS_HELP
    This option instructs the program, when it is unable to find a proper 
    match in the dictionary, and after various prefixes and suffixes, and 
    tring every Classical Latin trick it can think of, to go to a few that
    are usually only found in medieval Latin, replacements of caul -> col,
    st -> est, z -> di, ix -> is, nct -> nt.  It also tries some things   
    like replacing doubled consonants in classical with a single one.     
    Together these tricks are useful, but may give false positives (>20%).
    This option is only available if the general DO_TRICKS is chosen.     
    If the text is late or medieval, this option is much more useful than 
    tricks for classical.  The dictionary can never contain all spelling  
    variations found in medieval Latin, but some constructs are common.   
    The default choice is N(o), since the results are iffy, medieval only,
    and expensive.  This processing is turned on with the choice of Y(es).

DO_SYNCOPE_HELP
    This option instructs the program to postulate that syncope of        
    perfect stem verbs may have occured (e.g, aver -> ar in the perfect), 
    and to try various possibilities for the insertion of a removed 'v'.  
    To do this it has to fully process the modified candidates, which can 
    have a consderable impact on the speed of processind a large file.    
    However, this trick seldom producesa false positive, and syncope is   
    very common in Latin (first year texts excepted).  Default is Y(es).  
    This lengthy processing is turned off with the choice of N(o).        



INCLUDE_UNKNOWN_CONTEXT_HELP
    This option instructs the program, when writing to an UNKNOWNS file,  
    to put out the whole context of the UNKNOWN (the whole input line on  
    which the UNKNOWN was found).  This is appropriate for processing     
    large text files in which it is expected that there will be relatively
    few UNKNOWNS.    The main use at the moment is to provide display     
    of the input line on the output file in the case of UNKNOWNS_ONLY.    

OMIT_ARCHAIC_HELP
    THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET!  
    This option instructs the program to omit inflections and dictionary  
    entries with an AGE code of A (Archaic).  Archaic results are rarely  
    of interest in general use.  If there is no other possible form, then 
    the Archaic (roughly defined) will be reported.  The default is Y(es).


OMIT_MEDIEVAL_HELP
    THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET!  
    This option instructs the program to omit inflections and dictionary  
    entries with AGE codes of E or later, those not in use in Roman times.
    While later forms and words are a significant application, most users 
    will not want them.  If there is no other possible form, then the     
    Medieval (roughly defined) will be reported.   The default is Y(es).  

OMIT_UNCOMMON_HELP
    THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET!  
    This option instructs the program to omit inflections and dictionary  
    entries with FREQ codes indicating that the selection is uncommon.    
    While these forms area significant feature of the program, many users 
    will not want them.  If there is no other possible form, then the     
    uncommon (roughly defined) will be reported.   The default is Y(es).  

DO_I_FOR_J_HELP
    This option instructs the program to modify the output so that the j/J
    is represented as i/I.  The consonant i was writen as j in cursive in 
    Imperial times and called i longa, and often rendered as j in medieval
    times.  The capital is usually rendered as I, as in inscriptions.     
    If this is NO/FALSE, the output will have the same character as input.
    The program default, and the dictionary convention is to retain the j.
    Reset if this ia unsuitable for your application. The default is N(o).

DO_U_FOR_V_HELP
    This option instructs the program to modify the output so that the u  
    is represented as v.  The consonant u was writen sometimes as uu.     
    The pronounciation was as current w, and important for poetic meter.  
    With the printing press came the practice of distinguishing consonant 
    u with the character v, and was common for centuries.  The practice of
    using only u has been adopted in some 20th century publications (OLD),
     but it is confusing to many modern readers.  The capital is commonly 
    V in any case, as it was and is in inscriptions (easier to chisel).   
    If this is NO/FALSE, the output will have the same character as input.
    The program default, and the dictionary convention is to retain the v.
    Reset If this ia unsuitable for your application. The default is N(o).



PAUSE_IN_SCREEN_OUTPUT_HELP
    This option instructs the program to pause in output on the screen    
    after about 16 lines so that the user can read the output, otherwise  
    it would just scroll off the top.  A RETURN/ENTER gives another page. 
    If the program is waiting for a return, it cannot take other input.   
    This option is active only for keyboard entry or command line input,  
    and only when there is no output file.  It is moot if only single word
    input or brief output.                 The default is Y(es).          


NO_SCREEN_ACTIVITY_HELP
    This option instructs the program not to keep a running screen of the 
    input.  This is probably only to be used by the developer to calibrate
    run times for large text file input, removing the time necessary to   
    write to screen.                       The default is N(o).           
 
    

MINIMIZE_OUTPUT_HELP
    This option instructs the program to minimize the output.  This is a  
    somewhat flexible term, but the use of this option will probably lead 
    to less output.                        The default is Y(es).          


SAVE_PARAMETERS_HELP
    This option instructs the program, to save the current parameters, as 
    just established by the user, in a file WORD.MDV.  If such a file     
    exists, the program will load those parameters at the start.  If no   
    such file can be found in the current subdirectory, the program will  
    start with a default set of parameters.  Since this parameter file is 
    human-readable ASCII, it may also be created with a text editor.  If  
    the file found has been improperly created, is in the wrong format, or
    otherwise uninterpretable by the program, it will be ignored and the  
    default parameters used, until a proper parameter file in written by  
    the program.  Since one may want to make temporary changes during a   
    run, but revert to the usual set, the default is N(o).    

 


Program source code

The program is written in Ada, and is machine independent.  Ada source
code is available for compiling onto other machines.  


                          GUIDING PHILOSOPHY                          



Purpose 

The dictionary is intended as a help to someone who knows roughly 
enough Latin for the document under study.  It gives the accidence and
meanings possible for an input Latin word.  It is for someone reading 
Latin text.  There is no English-to-Latin mode.  

This is a translation dictionary.  Mostly it provides individual words
in English that correspond to, and might be used in a translation of, 
words in Latin test.  The program assumes a fair command of English.  
This is in contrast to a conventional same-language desktop dictionary
which would explain the meanings of words in the same language.  The 
distinction may be obvious but it is important.  A Latin dictionary in
medieval times would have explanations in Latin of Latin words.  

There are various approaches to the preparation of a dictionary.  The 
most scholarly might be to select only proper and correct entries, 
only correct derivations, grammar, and spelling.  This would be a 
dictionary for one who wished to write 'correct' Latin.  (Correct 
being defined as the way Cicero, or your favorite writer or 
grammarian, used it.) The current project has a different goal.  This 
program is successful if word found in text is given an appropriate 
meaning, whether or not that word is spelled in the generally approved
way, or is 'good Latin'.  Thus the program includes various words and 
forms that may have been rejected by recent scholars, but still appear
in some texts.  Philosophically, thus program deals with Latin as it 
was, not as it should have been.  I make no corrections to Cicero, 
which some might have been tempted to do if producing an academic 
dictionary instead of a program.  Moreover I make no corrections of St
Jerome.  If your copy of the Vulgate has a particular spelling, that 
may be recognized by the program, either through a TRICK or as a 
dictionary entry that I have generated.  

A philosophical difference from many dictionary projects is that this 
one has no firm model of the user or application.  It is not limited 
to classical Latin, or to 'good practice', or to common words, or to 
words appearing in certain texts.  As a result there will be a lot of 
chaff in the output.  Some of this may be trimmed out automatically if
desired, but it is there and available.  

However inadequately, I hope to document decisions that went into the 
arrangement of the program and dictionary.  I am surprised that there 
is little or no such information to the user of published 
dictionaries.  If others generate similar products, or use the data 
from this one, they can do so in knowledge of how and why processes 
and forms were constructed.  

I make few value judgments and those are mechanical, not scholarly, 
and are documented herein.  Nevertheless some may be arbitrary, in 
spite of good intentions.  

Method

The program subtracts possible endings from an input words and 
searches a list of stems, trying to make a match.  If no exact match 
is possible, it tries various modifications, beginning with prefixes 
and suffixes, and eventually involving various regular spelling 
variations (or 'tricks') common in classical and medieval Latin.  

A choice was made that the base was classical Latin as defined by the 
Oxford Latin Dictionary (OLD).  Their primary time period is 
arbitrary/roughly 100 BC to 100 AD.  

The classical form of words is taken as the base.  Modifications are 
in such a way to correct to this base.  Further additions to local 
dictionaries should keep this in mind.  Modifications are made to the 
input words, not to the dictionary stems.  It could be done the other 
way, but the present situation was initially much easier.  There are 
some consequences of this approach.  For instance, it is easy to 
remove an 'h' from an input word to match with a stem.  It is much 
more difficult (but not impossible) to add 'h' in all possible 
positions to check against stems.  

It would be possible to match most words with a relatively smaller 
list of stems (or roots) and generous application of word 
construction.  This approach is not followed.  One difficulty is that 
while words may be constructed correctly, and the underlying meaning 
to be found from this construction, the common usage may be obscured 
by a formal interpretation of the parts.  In practice this occurs in 
20-40% of the cases.  This method is still very useful in approaching 
a word for which there has been no dictionary interpretation, but it 
puts a considerable burden on the normal user.  Further, in about 10% 
of constructions, the result is just wrong.  

In normal usage, if the program finds a simple match, it does not go 
further and consider what constructed words might also be valid.  (One
can override and force prefix/suffix construction with a switch, but 
one might not want to force all possible tricks.) 

For instance, if there is an adjective that matches, a corresponding 
identically spelled, logically valid noun will not be reported unless 
it is explicitly found in the dictionary, even though it could be 
constructed or inferred from the adjective or constructed with a 
suffix from a verb in the dictionary.  

An exception to this is that enclitics (eg., -que) are always 
considered.  Coloque can be a verb or collo-que.  The latter is in 
Virgil and should not be omitted.  Verb syncope is also favored.  In 
the vast majority of cases, if there is a possible syncope it is the 
correct parse.  This is given preference over word construction with 
suffix.  Audii is syncope of audivi, but it could also be aud-i-i.  
The latter is considered very unlikely.  

There are a large number of paths and possibilities.  Choices have 
been made in the code that result in the exclusion of some.  It is 
hoped that they were the best choices.  The method was constructed by 
taking a number of primary procedures and combining/assembling them in
such a way as to give reasonable parses for a number of test cases.  
Basicly, this is hacking, but it might be considered and emperical 
starting point from which one could construct a logical rationale.  

Therefore, the philosophy is to populate the stem list as densely as 
possible.  Even easily resolved differences are included redundantly 
(adligo as well as alligo - ad- is most of duplicates).  The advantage
is that while regular single-letter modifications are fairly easy, and
two letter differences are possible (but more expensive), further 
deviations are problematical.  The better populated the stem list, the
better the chance of a result.  

Even in easy cases the overpopulation is helpful.  Antebasis is easily
parsed as ante-basis ('pedestal before', which is reasonable), but 
inclusion as a separate word allows the additional information that it
is the hindmost pillar of the pedestal of a ballista.  

The stem list is also populated with variants suggested by different 
sources.  The problem is that the remains of classical Latin have gone
through many monks along the way.  These copyists may have made simple
mistakes (typos!), or have made what they thought were proper 
corrections (spell checkers!).  And twenty centuries later scholars 
work hard to reassemble the best Latin to present in the dictionary.  
But a particular document in the form presented to the reader may have
have a variety of spellings for exactly the same word in the same 
referenced passage (Pliny's Natural History is often subject to this 
problem).  (It may even be that modern texts and dictionaries have 
misprints!) All forms found in various dictionaries can be included, 
with the exception of those explicitly labeled 'misread' (and the 
argument probably could mandate their inclusion also).  However, a 
single example of a variant in one case will not be included as a 
dictionary entry.  If such a word is sufficiently important, if it is 
used frequently or by several authors, it will be entered as a UNIQUE.


Lewis and Short seem to be more willing than the more recent Oxford 
Latin Dictionary to raise a few examples of variation to an entry (at 
least an alternate).  Generally, I make an entry if some dictionary 
does so.  But within an entry I generate additional possible stems not
noted elsewhere, e.g., I expand first declension verbs with '-av' 
perfect stems, even though no example exists in classical Latin.  This
is often the practice in other dictionaries also.  

It is often the practice in paper dictionaries to double up on an 
entry that may be either adjective or noun, usually by leading with 
the adjective and mentioning its use as a noun.  A much larger set of 
adjective/noun pairs is favored with separate entries.  It is the 
philosophy of this program to make separate entries whenever there is 
an example in any reference dictionary.  This might faciliate the task
of a larger translation program which would handle phrases or 
sentences.  However there has been no effort to explicitly generate 
such pair expansion if there is no precedent, and the user must still 
recognize the possibility of unexpanded multiple possibilities for 
substantives.  

An argument against a large stem list is that it increases the storage
required (but this is extremely modest by current standards) and 
increases processing time for search of the stems (this is far offset 
by the processing which would be required to construct or analyze 
words working from a smaller stem list).  

Additional parts of verbs are included (first conjugation is easily 
filled out, even eccentric verbs if they are compounds of known 
parts), although they may not have been found in any well known texts.
Cases can be logically constructed that are 'missing' in classical 
Latin.  Verbs with prefix can be expanded when the base is known.  
That a form has not been found in surviving copies of classical 
documents does not mean that it was not on the lips of every centurion
and his girl friend, or that it might not find its way into medieval 
texts.  

 In some cases there are good reasons not to do the mathematical 
expansion, and these are pointedly avoided.  There is no mechanical 
generation of, for instance, conl- words for every coll- word, unless 
there is some citation or reasonable rationale.  They may be paired in
almost every case, but, for instance, collis and collyra are not.  
However, forms that are mentioned in dictionaries explicitly, or 
implicitly by being derived from words having variant forms, are 
included in order to reduce the dependence on 'tricks'.  OLD has a 
conp- for almost every comp- (except derivatives from como).  Rare 
exceptions seem to be rare words for which few examples (or only one) 
exist.  Even in some of these cases, OLD (mechanically?) gives two 
forms.  L+S follows the same pattern, except for words of late Latin 
(which would not be found in OLD).  It is presumed that the general 
practice in later times was always to use comp-, and the program 
dictionary follows that.  There are many acc-/adc- pairs, but OLD has 
a fair number of acc- words without mention of a corresponding adc-, 
and so the possible generation of these words has been resisted.  If 
an example turns up in text, the appropriate trick procedure should 
suffice 

One suspects that some amount of analytical expansion is present even 
in the best dictionaries.  Otherwise how can one explain four 
alternate spellings for a word which apparently only appears in a 
single inscription.  

Adjectives from participles are included if an entry is found in some 
reference dictionary.  In some case the adjective has a special 
meaning not obvious from the verb.  The program will return both the 
adjective and the participle with its verb meaning.  The user should 
give some additional consideration to the adjective meaning in this 
case.  If the adjective is marked rare while the verb is common, it is
likely there is reference to a special meaning.  

Tricks are expensive in processing time.  Each possible modification 
is made, then the resulting word goes through the full recognition 
process.  If it passed, that is reported as the answer.  If it fails, 
another trick is tried.  This is effective if very few words get this 
far.  It is expected that application of single tricks will solve most
of the resolvable difficulties.  It would be impractical to 
mechanically apply several tricks in series to a word.  If the 
dictionary is heavily and redundantly populated, tricks are rarely 
necessary (and therefore not an overall processing burden) and largely
successful (if the input word is a valid, but unusual, 
variant/construction).  

Further, a conventional dictionary, especially one that wishes to set 
a standard for proper language, excludes words that may not meet 
criteria of propriety, slang, misspellings, etc.  This may place the 
onus on the reader to convert words.  A computer dictionary ought to 
relieve the reader as much as possible.  The present program may be a 
far way from complete, but it's goal is to strive for that.  

Word Meanings

The meanings listed are generally those in the 
literature/dictionaries.  In the case of common words, there is 
general agreement among authors.  Some uncommon words display 
convoluted interpretations.  

Generally, the meaning is given for the base word, as is usual for 
dictionaries.  For the verb, it will be a present meaning, even when 
the tense input is perfect.  For an adjective, the positive meaning is
given, even if a comparative or superlative form is shown.  This is 
also so when a word is constructed with a suffix, thus an adverb 
constructed from its adjective will show the base adjective meaning 
and an indication of how to make the adverb in English.  

For the level of usage for this program, and for convenience in 
coding, the meaning field has been fixed at 80 characters.  It is 
possible to have multiple 80 character lines for an entry, but this 
only necessary for the most common words.  In order to conserve space,
extraneous helpers like 'a', 'the', 'to', which sometimes appear in 
dictionary definitions, are generally omitted.  The solidus ('/') is 
used both to separate equivalent English meanings and to conserve 
space.  

I have taken it upon myself to add some interpretations and synonyms, 
and propose common usage for otherwise complex descriptive 
definitions.  The idea is to prompt the reader, expecting that the 
text may not be that from which some dictionary copied the meaning 
(from some 18th century translator!).  

Where available, the Linnean or 'scientific Latin' name is given in 
parentheses, mostly for plants.  This is not a classical Latin name, 
but a modern designation.  Similarity of this designation to some 
Latin word may not be historically significant.  

The spelling of the English meanings is US (plow not plough, color not
colour, and English corn is rendered as grain or wheat), in spite of 
the fact that most of the Latin dictionaries that I have are British 
and use British spelling.  The reason for this is (besides uniformity 
in the program) that there is much computer processing and checking of
the dictionary data, including spell-checking of the English.  (This 
is not to say that everything is correct, but it is much better than 
it would be without the computer checking.) All my programs speak US 
English, so I can count on it.  Only some are available in UK English,
and I do not have all of those versions.  

Latin dictionaries seem to be locked into the early 19th century.  The
English terms seem stilted, even by current British usage.  This is 
probably because much work in translation was started then and later 
work tended to copy from the previous dictionaries.  While this 
dictionary has done some modernization, some of the previous 
obscurities have been preserved.  This was done in order that certain 
machine processes could compare the results of automatic translation 
with existing published work.  

In addition, I have given US meanings to some terms that seem to be 
literally translated from the Latin (or German!) (a person who 
steals/drives off cattle is a rustler in the US).  

Most dictionaries have an etymological approach, they are driven by 
the derivation of words to distinguish with separate entries words 
that may be identical in spelling but different derivations.  But they
can lump entirely different, even contradictory, meanings in a single 
entry if there is some common derivation.  Philosophically, this 
dictionary is usually not sensitive to derivations, but sometimes 
supports multiple entries for vastly different meanings, application 
areas, or eras.  

Proper Names

Only a very few proper names are included, many just for test 
purposes, others that users have requested.  The number of proper 
names is almost limitless but very few are applicable to a particular 
document, and if it is an obscure document it is unlikely that the 
names would be found in any dictionary.  

Meaning for proper names may cite a likely example of a person with 
that name.  This is just an example; there are lots of others with 
that name.  

There is a switch (defaulted to Yes) that allows the program to assume
that any capitalized unknown word is a proper name, and to ignore it.  
Also, one can make up a local dictionary of names for one's particular
application.  

Letter Conventions

U and/or 

Strictly speaking, Latin did not have a V, just a consonant U, or a U 
character that was easier in capitals (the way Latin was written by 
the Romans) to write or chisel in stone as V. However, most modern 
texts and dictionaries (with the important exception of the OLD) make 
the distinction with two characters (u and v).  It appeared most 
appropriate in a computer context (never destroy information) to make 
the distinction and follow the common practice.  So all dictionary 
entries maintain the V/v.  However, an input word following the U 
convention will be found.  At an earlier version, an algorithm was 
kludged to convert where necessary.  While this worked in most cases, 
there were difficulties.  The present system processes the dictionary 
and the input word as though U and V were the same letter, although 
the basic dictionary maintains the distinction and the output reflects
this.  There is no longer any need for the user to set modes for this 
process.  

I and/or 

A similar situation arises with I, and its consonant form, J. In this 
instance, the common practice is use only I, but there are many 
counter-examples, both text and dictionaries.  (Lewis + Short uses J, 
but OLD does not.) Because of common practice, the program started out
as pure-I dictionary with conversion of J-to-I on input.  It remained 
that way through many versions, in spite of the logical inconsistency 
with U-V.  The technique worked perfectly, but eventually the 
aesthetic of consistency won out and the U/V technique described above
was extended to I/J.  As yet, most all dictionary entries are pure-I, 
but the mechanism is in place to use J in both dictionary and input.  

W

There are examples of W in some medieval Latin.  I have not yet 
directly faced this, and have no words in the dictionary with W. 
However, the W problem is not analogous to U/V.  While W sometimes 
could correspond to V or UU, in most cases it is a valid letter, 
reflecting a Germanic origin of the word.  It will be treated as a 
real letter, and tricks employed were useful.  

Dictionary Codes

Several codes are associated with each dictionary entry (presently 
AGE, AREA, GEO, FREQ, SOURCE).  These were provided against the 
possibility of the program using them to make a better interpretation.
For the most part, this information is of little additional help to 
the reader, but it is carried in codes because it is not available to 
the program in any other way.  Some of these codes, like the KIND code
for nouns, may be used, others may not.  The program is still in 
development and these are put in to experiment with a possible 
capability.  Later versions may use them, omit them, or provide 
others.  

The program covers a combination of time periods and applications 
areas.  This is certainly not the way in which dictionaries are 
usually prepared.  Usually there is a clear limit to the time or area 
of coverage, and with good reason.  A computer dictionary may have 
capabilities that mitigate those reasons.  Time or area can be coded 
into each entry, so that one could return only classical words, even 
though matching medieval entries existed.  (The program has that 
capability now, but it is not yet clear how to apply it.) 

There is some measure of period and frequency that can be used to 
discriminate between identical forms, but if there is only one 
possible match to an input word, it will be displayed no matter its 
era or rarity.  The user can choose to display age and frequency 
warnings associated with stems and meanings, but the present default 
is not to.  

So far these codes have not been of much use, especially since the 
only significant exercises have been with classical Latin.  Other 
situations may change this.  Perhaps the only impact now is for those 
words which have different meanings in different applications or 
periods.  For these the warning may be useful.  Otherwise, if there is
only one interpretation for a word, that is given.  

Rare and age specific inflection forms are also displayed, but there 
is a warning associated with each such.  

AGE

The designation of time period is very rough.  It is presently based 
on dictionary information.  If the quotes cited are in the 4th 
century, and none earlier, then the word is assumed to be late Latin, 
and one might conclude that it was not current earlier.  One flaw in 
this argument could be that the citation given was just the best 
illustration from a large number covering a wide period.  On the other
hand, the word could have been well known in classical times but did 
not appear in any surviving classical writings.  In such a case, it is
reasonable to warn the reader of Cicero that this is not likely the 
correct interpretation for his example.  This capability is still 
developmental, and its usefulness is still an open question.  

If there is a classical citation, the word could be designated as 
classical, but unless there is some reason to conclude otherwise, it 
is expected that classical words are valid for use in all periods (X),
are universal for well considered (published) Latin.  

A designation of Early (B) means that there are not classical 
citations, except for poetry, in which the poet is invoking the past 
(or just straining for meter).  Obsolete words occur similarly in 
English literature and poetry.  

Much which is designated late or medieval may be vulgar Latin, in 
common use in classical times but not thought suitable for literary 
works.  

In all periods the target is Latin.  Archaic Latin, for purposes of 
the program, is still Latin, not Etruscan or Greek.  Medieval Latin is
that which was written by scholars as the universal Latin, not 
versions of early French or Italian.  

  type AGE_TYPE is (
   X,   --              --  In use throughout the ages/unknown -- the default
   A,   --  archaic     --  Very early forms obsolete by classical times
   B,   --  early       --  Early Latin, pre-classical, used for effect/poetry
   C,   --  classical   --  Limited to classical (~200 BC - 200 AD)
   D,   --  late        --  Late, post-classical, early Christian (3-6)
   E,   --  later       --  Latin not in use in Classical/Roman times (7-10)
   F,   --  medieval    --  Spanning E and G, including late medieval (11-15)
   G,   --  modern      --  Latin not in use before 16th century (16-18)
   H    --  neo         --  Coined recently, words for new things (19-20)
                             );


AREA

While the reader can make his own interpretation of the area of 
application from the given meaning, there may be some cases in which 
the program can also use that information (which it can only get from 
a direct coding).  This has not yet been used in the program, but the 
possibility exists.  If the reader were doing a medical text, then 
higher priority should be given to words coded B, if a farming book, 
then A coded words should be given preference.  

The area need not apply to all the meanings, just that there is some 
part of the meaning that is specialized to or applies specifically to 
that area and so is called out.  

  type AREA_TYPE is (
          X,      --  All or none
          A,      --  Agriculture, Flora, Fauna, Land, Equipment, Rural
          B,      --  Biological, Medical, Body Parts  
          D,      --  Drama, Music, Theater, Art, Painting, Sculpture
          E,      --  Ecclesiastic, Biblical, Religious
          G,      --  Grammar, Rhetoric, Logic, Literature, Schools                     
          L,      --  Legal, Government, Tax, Financial, Political, Titles
          P,      --  Poetic
          S,      --  Science, Philosophy, Mathematics, Units/Measures
          T,      --  Technical, Architecture, Topography, Surveying
          W,      --  War, Military, Naval, Armor
          Y       --  Mythology
                      );


GEO

This code was included to enable the program to distinguish between 
different usages of a word depending on where it was used or what 
country was the subject of the text.  This is a dual usage, origin or 
subject.  

  type GEO_TYPE is (
          X,      --  All or none
          A,      --  Africa      
          B,      --  Britain     
          C,      --  China       
          D,      --  Scandinavia 
          E,      --  Egypt       
          F,      --  France, Gaul
          G,      --  Germany     
          H,      --  Greece      
          I,      --  Italy, Rome
          J,      --  India       
          K,      --  Balkans     
          N,      --  Netherlands
          P,      --  Persia      
          Q,      --  Near East   
          R,      --  Russia              
          S,      --  Spain, Iberia       
          U,      --  Eastern Europe      
          Y       --  Mythology
                     );


FREQ

There is an indication of relative frequency for each entry.  These 
codes also apply to inflections, with somewhat different meaning.  If 
there were several matches to an input word, this key may be used to 
sort the output, or to exclude rare interpretations.  The first 
problem is to provide the score.  The initial method is to grade each 
word by how much column space is allocated to it in the Oxford Latin 
Dictionary, or the number of citations, on the assumption that many 
citations mean a word is common.  This is not the intent of the 
compilers of existing dictionaries, but it is almost the only 
indication of frequency that can be inferred from most dictionaries.  
In many cases it seems to be a reasonable guess, certainly for those 
most common words, and for those that are very rare.  With the 
understanding that adjustments can be made when additional information
is available, the initial numeric criteria are: 


A   full column or more, more than 50 citations
B   half column, more than 20 citations
C   more then 5 citations
D   4-5 citations
E   2-3 citations
F   only 1 citation



In the case of late Latin in Lewis and Short, these frequencies may be
significant underestimates, since the volume of applicable texts 
considered seems to be much smaller than for classical Latin resulting
in fewer opportunities for citations.  Nevertheless, barring 
additional information, the system is generally followed.  

For the situation where there are several slightly different spellings
given for a word, they all are given the same initial frequency.  The 
theory is that the spelling is author's choice while the frequency is 
attached to the word no matter how it is spelled.  I presume that for 
a specific text the author always spells the word the same way, that 
there is no distribution of spellings within a individual text.  One 
exception to this rule is the case where a variant spelling is cited 
only for inscriptions.  There may be some significance to this and a 
FREQ of I is assigned.  The logic of this choice is debatable.  
However, for some variations there is clearly a difference in 
application and this can be reflected in the frequency code.  
Likewise, there are situations wherein words of the same spelling but 
different meanings may have different frequencies.  This may help to 
select the most likely interpretation.  

One has a check against the frequency list of Diederich for the most 
common, and those are probably the only ones that matter.  But the 
frequency depends on the application, and it should be possible to run
a new set of frequencies if one had a reasonable volume of applicable 
text.  The mechanical verification of word frequency codes is a 
long-term goal of the development, but must wait until the dictionary 
data is complete.  

Inscription and Graffiti are designations of frequency only in that 
the only citations found were of that nature.  One might suppose that 
if literary examples were known they would have been used.  So one 
might expect that such words would not be found in a student's text.  
There is no implication that they were not common in the spoken 
language.  

A very special case has been created for 'N' words, words for which 
the only dictionary citation is Pliny's Natural History.  It seems, 
from reading of dictionaries, that this work may be the only source 
for these words, that they do not appear in any other surviving texts.
They are usually names for animals, plants or stones, many without 
identification.  Such words may appear only in Lewis and Short and the
Oxford Latin Dictionary, the unabridged Latin classical dictionaries.  
These words are omitted from most other Latin dictionaries and, 
although they fall in the classical period and are from a very well 
known writer, there is no mention of the omission.  So there may be an
argument to disparage these words, unless one is reading Pliny.  

Most of these words are of Greek origin (although that is also true 
for much of Latin).  For many, the dictionaries report different forms
or declensions for the word giving the same citation.  Often one 
dictionary will give a Greek-like form (-os, -on) where another gives 
a Latinized form (-us).  There is no consistency.  Both OLD and L+S 
disagree on Latin and Greek forms, with no overwhelming favoritism to 
one form attached to either dictionary.  This may be a reflection of 
the fact that the dictionaries grew over a long time with several 
editors, many workers, and no rigid enforcement of standards.  

There is another problem that is found chiefly in connection with 
Pliny-type words.  Since the literature is very sparse on examples, it
is often uncertain whether a particular usage is appropriately listed 
as a noun, as an adjective, or as adjective used as a substantive.  
The present dictionary, in blessed innocence, records all forms 
without bias.  

  type FREQUENCY_TYPE is (
    X,    --              --  Unknown or unspecified
    A,    --  very freq   --  Very frequent, in all Elementary Latin books
    B,    --  frequent    --  Frequent, in top 10 percent           
    C,    --  common      --  For Dictionary, in top 10,000 words
    D,    --  lesser      --  For Dictionary, in top 20,000 words
    E,    --  uncommon    --  2 or 3 citations
    F,    --  very rare   --  Only one citation in OLD or L+S
    I,    --  inscription --  Presently not much used
    M,    --  graffiti    --  Presently not much used
    N     --  Pliny       --  Things that may appear only in Pliny
                      );


For inflections, the same type is used with different weights 

    X,    --              --  Unknown or unspecified
    A,    --  most freq   --  Very frequent, the most common
    B,    --  sometimes   --  sometimes, a not unusual variant
    C,    --  uncommon    --  occasionally seen
    D,    --  infrequent  --  recognizable variant, but unlikely
    E,    --  rare        --  for a few cases, very unlikely
    F,    --  very rare   --  singular examples, 
    I,    --              --  Presently not used
    M,    --              --  Presently not used
    N     --              --  Presently not used



SOURCE

Source is the dictionary or grammar which is the source of the 
information, not the Cicero or Caesar text in which it is found.  

For a number of entries, X is now given as Source.  This is primarily 
for the vocabulary (about 13000 words) which was in place before the 
Source parameter was put in, and which has not been updated.  In fact,
they are from no particular Source, just general vocabulary picked up 
in various texts and readings.  Although, during the dictionary update
beginning in 1998, all entries are being checked against sources, it 
may be improper to credit (blame?) a Source when that was not the 
origin of the entry, remembering that the actual entries are of my 
generation entirely and may not correspond exactly to any other view.  
However, in the second pass (as far as it has progressed) all 
classical entries have been verified with the Oxford Latin Dictionary 
(OLD).  (By that I mean that I have checked, not to imply that I have 
not made errors.) This does not mean that the entry necessarily agrees
with the OLD, but that I read the OLD entry with great respect and put
down what I did anyway.  Newer entries, added in this process, and 
those checked later in the process, if found in the OLD, have the O 
code.  Words added from Lewis and Short, but not in OLD, have the S 
code, etc.  All entries for which there is a Source will be found in 
some form in that Source, but the details of the interpretation of 
declension and meaning is mine.  They may not necessarily be found as 
primary entries, or even directly referrenced, but they will have been
constructed from information in that source.  (For instance, "adp see 
app" may generate more adp words than are directly mentioned in the 
bulk of the dictionary.) 

There should be no expectation, nor is there any claim, that the 
result of the program is exactly that from the cited Source.  Each 
entry is my responsibility alone, and there are significant 
differences and elaborations.  However, in each case where there is a 
Source, the reader can find the basis from which the program data was 
derived.  If I have done a proper job, he will not often be surprised.


The list of sources goes far beyond what has been directly used so 
far.  There should be no expectation at this point in the development 
that all these sources have even been used.  They are listed as I have
copies and as they might be consulted.  They are encoded so that the 
program might recognize and process the source should it come up.  I 
have sought and received permission for those which have been 
extensively used.  Others have only been used for an occasional check 
(fair use) or have denied me permission (Brill for Niermeyer).  

  type SOURCE_TYPE is (
       X,      --  General or unknown or too common to say
       A,      --  Allen + Greenough, New Latin Grammar, 1888 (A+G)
       B,      --  C.H.Beeson, A Primer of Medieval Latin, 1925 
       C,      --  Charles Beard, Cassell's Latin Dictionary 1892 (CAS)       
       D,      --  J.N.Adams, Latin Sexual Vocabulary, 1982
       E,      --  L.F.Stelten, Dictionary of Eccles. Latin, 1995
       F,      --  Roy J. Deferrari, Dictionary of St. Thomas Aquinas, 1960 (DeF)
       G,      --  Gildersleeve + Lodge, Latin Grammar 1895 (G+L)
       H,      --  Harrington/Pucci/Elliott, Medieval Latin 2nd Ed 1997 
       I,      --  Leverett, F.P., Lexicon of the Latin Language, Boston 1845
       J,      --  C.C./C.L. Scanlon Latin Grammar/Second Latin, TAN 1976
       K,      --  W. M. Lindsay, Short Historical Latin Grammar, 1895
       L,      --  Lewis, C.S., Elementary Latin Dictionary 1891
       M,      --  Latham, Revised Medieval Word List, 1980
       N,      --  Lynn Nelson, Wordlist
       O,      --  Oxford Latin Dictionary, 1982 (OLD)
       P,      --  Souter, A Glossary of Later Latin to 600 A.D., Oxford 1949
       Q,      --  Other, unspecified dictionaries
       R,      --  Plater and White, A Grammar of the Vulgate, Oxford 1926
       S,      --  Lewis and Short, A Latin Dictionary, 1879 (L+S)
       T,      --  Found in a translation  --  no dictionary reference
       U,      --  Du Cange            
       V,      --  Vademecum in opus Saxonis - Franz Blatt
       W,      --  My personal guess   
       Y,      --  Niermeyer, Mediae Latinitatis Lexicon Minus
       Z       --  Sent by user --  no dictionary reference
         
               --  Consulted but used only indirectly
               --  Liddell + Scott Greek-English Lexicon
                       );


Dictionary Codes

There are a few special conventions in setting codes.  

Proper Names 

Proper names are often identified by the AGE in which the person 
lived, not the age of the text in which he is referenced, the AREA of 
his fame or occupation, and the GEO from which he hailed.  This refers
to some most-likely person of this name.  A name may be shared by 
others in different ages.  Thus Jason, the Argonaut, is Archaic, Myth,
Greek (A Y H).  (It is not likely that a Latin text would refer to a 
TV star.) Tertullian, an early 3rd century Church Father from 
Carthage, author of the first Christian writings in Latin, is Late, 
Ecclesiastic, Africa (D E A).  Jupiter is (A E I), which is a bit 
sloppy since he is present later.  Today he may be a myth, but then he
was a god.  But even gods are not eternal (X) in language, and an 
initial place is found for them.  Place names are likewise coded, 
although with less confidence.  

Vertical Bar 

While not visible to the user, the dictionary contains certain 
meanings starting with a vertical bar (|).  This is a code used to 
identify meanigs that run beyond the conventional 80 characters.  One 
or more vertical bars leading the meaning allows tools to recognize 
that they are additional meanings to an entry already encountered, 
usually the entry immediately before when the sort is for that reason.
This is only of concern to those dealing with the raw dictionary who 
have asked.  

Evolution of the Dictionary

The stem list was originally put together from what might be called 
'common knowledge', those words that most Latin texts have.  The first
version had about 5000 dictionary entries, giving up to 95% coverage 
of simple classical texts.  This grew to about 13000 entries with 
specific additions when gaps were found.  With this number it was 
possible to get better than a 99% hit rate on Caesar (an area from 
which the dictionary was built).  Parse of other works fell to 95-97%,
which may be mathematically attractive but leaves a lot to be desired 
in a dictionary, since a translator is usually familiar with the vast 
bulk of the language and just needs help on the obscure words.  Having
just the common words is not enough, indeed not much help at all.  So 
an attempt is made to make the dictionary as complete as possible.  
All possible spellings found in dictionaries are included.  

Starting with the 13000, the expansion project beginning in 1998 
sought to verify the existing words and supplement with any new found 
ones.  Thus all classical Latin words are consistent with the OLD (not
to say taken from, because most were not, but checked against).  Any 
significant deviation is indicated, either as from another source, or 
in the definition itself.  

L+S is used for later Latin and to check OLE work.  This started with 
the thought that if a word was in L+S but not in OLE it must be later 
Latin, beyond the range of OLD.  I was surprised at how many words 
with classical citations were in L+S but not in OLD, and how many are 
of different spelling.  

The refinement is proceeding one letter at a time, as is the tradition
for all great dictionaries.  First stage refinement has proceeded 
through COQ.  

Testing

The program has been run against a few common classical texts.  
Initially this was mostly a check of the process and reliability of 
the program.  It is now possible to run real texts and get valid 
statistics.  Relatively few texts have been run multiple times in 
order to understand exactly where failure occured and to regression 
test the solutions.  Such testing has taken place on texts totaling 
well over a million words.  The best results come from those which 
have been run the most times.  Caesar and the Vulgate are essentially 
without unknowns (excluding proper names)., Seutonius and Virgil are 
at the 0.1% level, Varro and Pliny have somewhat more than 1% unknowns
due to their specialized vocabulary.  While this is a mechanical test 
and does not assure that the form and meaning reported by the program 
is always correct, the actual number of misses found by limited 
detailed examination is vanishingly small.  

The hardest test is against another dictionary.  While getting a 97+% 
hit rate on long classical texts, a run against a large dictionary 
might fall to 85-90%, the missing words being in those letters which 
the update has not reached.  This is to be expected, since we both 
have the 10000 most common words and have made somewhat different 
additions beyond that.  So large electronic wordlists are a check on 
the program, and are reserved for that purpose, not simply 
incorporated as such.  

The Latin Word List of Lynn Nelson is an excellent benchmark, more so 
because of its medieval content.  



Current Status and Future Plans

The present phase of refinement has incorporated the Oxford Latin 
Dictionary and Lewis and Short entries into D (about a fourth).  
Periodically, when I need a change of task, I run a major author 
(primarily from the Packard Humanities Institute CD ROM) to check the 
effectiveness of the code.  I may then include some words which turn 
up frequently as unknowns, but this is done as the spirit moves me.  
Smaller sections of later authors may also be processed, giving some 
growth in medieval Latin entries.  Recently I have worked the Vulgate 
of St. Jerome.  

I will continue to refine the dictionary and the program.  The major 
goal is to complete the inclusion of OLD and L+S, and this may take 
years.  Along the way, and later, I will expand to medieval Latin.  I 
am not so unrealistic as to believe that I will 'finish', indeed, this
is a hobby and there is no advantage to finishing.  

An eventual outcome would be to have some institution, with real Latin
capability, provide an exhaustive and authoritative program of this 
nature.  Until then, I and other individuals will make available our 
programs.  



                   WRITING DICT.LOC AND UNIQUES.LAT                   


To make the dictionary files used by the program is not difficult, but
it takes several auxiliary programs for checking and ordering which 
are best handled by one center.  These are available to anyone who 
needs them, but it is better that any general additions to the 
dictionary be handled centrally that they can be included in the 
public release for everyone.  

However, it is possible for a user to enhance the dictionary for 
special situations.  This may be accomplished either by providing new 
dictionary entries in a DICT.LOC file, those to be processed in the 
regular manner, or to add a unique (single case/number/gender/...) in 
a text file called UNIQUES.  

DICT.LOC

A dictionary entry for WORDS (in the simplest, editable form as read 
in a DICT.LOC) is 


aqu   aqu
N    1 1 F T     X X X X X 
water;



For a noun there are two stems.  The definition of "stem" is inherent 
in the coding of inflections in the program.  Different grammars have 
different definitions.  There is no formal connection with any other 
usage.  

To these stems are applied, as appropriate, the endings 


         S       P
NOM      a       ae
GEN      ae      arum
DAT      ae      is
ACC      am      as
ABL      a       is



Or rather, the input word is analyzed for possible endings, and when 
these are subtracted a match is sought with the dictionary stems.  A 
file (INFLECTS.LAT) gives all the endings.  

In this example, the first line 


aqu   aqu

contains the two noun stems for the word found in printed dictionaries
as 


aqua, -ae



The second line 


N    1 1 F T     X X X X X 

says it is a noun (N), of the first declension, first variant, is 
feminine (F), and is a thing (T), as opposed to a person, location, 
etc.  The X X X X X represents coding about the age in which it is 
applicable, the geographic and application area of the word, its 
frequency of use, and the dictionary source of the entry.  None of 
this is necessary in a DICT.LOC although something must be filled in 
and X X X X X is always satisfactory.  

The last line is the English definition.  It can be as long as 80 
characters.  


water;



The case and exact spacing of the stems and codes is unimportant, as 
long as they are separated by at least one blank.  

The PART_OF_SPEECH_TYPE that you are most interested in are (X, N, 
ADJ, ADV, V).  X is always a valid entry.  It stands for none, or all,
or unknown.  0 has the same function for numeric types.  

The others in the type (PRON, PACK, VPAR, SUPINE, PREP, CONJ, INTERJ, 
NUM, TACKON, PREFIX, SUFFIX) are either less interesting or 
artificial, used only internally to the code.  

A noun or a verb has a DECN_RECORD consisting of two small integers.  
The first is the declension/conjugation, and the second is a variant 
within that.  

N 1 1 is the conventional first declension.  But there are variants 
(6, 7, 8) which model Greek-line declensions.  (Greek-like variant 
start at 6); 

N 2 1 is the regular -us, -i second declension.  

N 2 2 is the regular -um, -i neuter form.  

There is a N 2 3 for 'r' forms like puer, pueri.  In this case there 
is the possibility of a difference in stems (ager, agri has stems 
coded as ager, agr).  

Again there are Greek-like variants (6, 7, 8, 9).  

N 3 1 is regular third declension (lex, legis -> lex, leg) for 
masculine and feminine.  

N 3 2 is for neuter (iter, itineris -> iter, itiner).  

Variants 3 and 4 are for I-stems.  And so it goes.  

Each noun has a GENDER_TYPE (X, M, F, N, C).  X for unknown (something
I avoid for gender - guess if you have to) or all genders (useful in 
the code but not in a dictionary), and C for common (M + F).  

There is also a 


NOUN_KIND_TYPE (X,            --  unknown, nondescript
                N,            --  proper Name
                L,            --  Locale, country, city
                W,            --  a place Where
                P,            --  a Person type
                T)            --  a Thing

which you probably do not care about either.  Most entries will all be
Thing.  

Other codes are enumerated in the body of this document.  

Verbs are done likewise, but there are four stems, as described below.
An example is


am  am  amav  amat
V 1 1 X    X X X A O
love;



Now comes the hard part.  When starting from a dictionary one has all 
the information to decide the values.  Just having a single instance 
of the word lacks a lot.  Consider some examples from a user.  

[< 1%).  So let us guess ]
elytris is surely from the Greek for sheath.  The question is how 
Latinized did it get.  I suspect that by the 17th century it was 
completely Latinized.  Even in classical times there was very little 
left in the way of Greek forms ( elythris (or -es), elythris (N 3 3) 
but it could be a Greek-like form (N 3 9).  I do not even know what 
case I started with, if NOM, then it must be -is, -is, if GEN then 
-es, -is is reasonable.  Then again, if it is DAT P we might have a N 
1 1. 

All this seems very uncertain, and, in the absence of a real 
dictionary entry, it is.  However you can make the choices such that 
the result (the output of the code) matches exactly what you have.  If
you have more information, lots of examples, the uncertainty shrinks.  
If you have just a single isolated example, there are limits.  (But if
you do 100 and have more information about some, you can make better 
guesses about the rest.) 

Next we need a gender.  It may not make much difference (if M or F, or
C) in this case, but sometimes it matters.  You might be able to 
figure that out from the text.  

It is a thing (T), but X will work for your purposes.  For the rest, X
X X X X works fine.  

So we have 


elythris   elythr
N   3 3  F T       X X X X X 
elytra, wing cover of beetles



sat, I happen to know is an abbreviated form of satis, so it is easy.  
If you want the adverb form, as you indicate: 


sat
ADV POS     X X X X X 
sufficiently, adequately; quite, well enough; fairly, (moderately)



Adverbs have a comparison parameter (X, POS, COMP, SUPER).  Most will 
be POS.  

It also is an indeclinable (N 9 9) substantive: 


sat
N 9 9 N T     X X X X X 
enough, sufficient; enough and some to spare; one of sufficient power



deplanata seems to be a 1-2 declension adjective, the -us, -a, -um 
form.  It also seems to derived from the verb deplanto (V 1 1) - break
off/sever (branch/shoot).  


deplanat   deplanat
ADJ 1 1 POS     X X X X X
broken off/severed (branch/shoot); (flattened)



Adjectives have a DECN and a comparison.  

The following were not at the time in the dictionary, but were in the 
OLD.  


alat  alat
ADJ 1 1 POS     X X X X X 
winged, having wings; having a broad/expanded margin

(punct - ul - at  -> hole/prick/puncture - small - having)

punctulat   punctulat
ADJ 1 1 POS    X X X X X 
punctured; having small holes/pricks/stabs/punctures


appendiculat   appendiculat
ADJ 1 1 POS    X X X X X 
appendiculate; having/fringed by small appendages/bodies

acetabul   acetabul
N 2 2 N T     X X X X X 
small cup (vinegar), 1/8 pint; cupped part (plant); sucker; socket, (cavity)

ruf  ruf
ADJ   1 1 POS     X X X X X 
red (various); tawny; red-haired (persons); strong yellow/moderate orange

testace   testace
ADJ  1 1 POS    X X X X X 
bricks; resembling bricks (esp. color); having hard covering/shell (animals)



This one had no classical correspondence.  


brunne   brunne
ADJ 1 1  POS     X X X X X 
brown



There is one other remark.  It is probably wise to include in the 
definition a more complete English meaning.  Just saying appendiculat 
-> appendiculate is not as interesting as it might be.  

All the inflections are in a file called INFLECTS.LAT now a part of 
the general distribution of source code and data files.  


Here is a quick reference for the most common types.  


--  All first declension nouns  - N 1 1 
--  Ex: aqua aquae  =>  aqu aqu


--  Second declension nouns in "us"  - N 2 1 
--  Ex: amicus amici  =>  amic amic

--  Second declension neuter nouns - N 2 2 
--  Ex: verbum verbi  =>  verb verb

--  Second declension nouns in "er" whether of not the "er" in base - N 2 3 
--  Ex; puer pueri  =>  puer puer
--  Ex: ager agri   =>  ager agr

--  Early (BC) 2nd declension nouns in ius/ium (not filius-like)  - N 2 4 
--  for the most part formed GEN S in 'i', not 'ii'   --  G+L 33 R 1
--  Dictionaries often show as ...(i)i
--  N 2 4 uses GENDER discrimination to reduce to single VAR
--  Ex: radius rad(i)i  => radi radi        M
--  Ex: atrium atr(i)i  =>  atri atri       N


--  Third declension M or F nouns whose stems end in a consonant - N 3 1 
--  Ex: miles militis  =>  miles milit
--  Ex: lex legis  =>  lex leg
--  Ex: frater fratris  =>  frater fratr
--  Ex: soror sororis  =>  soror soror
--  All third declension that have the endings -udo, -io, -tas, -x 
--  Ex: pulcritudo pulcritudinis  =>  plucritudo pulcritudin
--  Ex: legio legionis  =>  legio legion    
--  Ex: varietas varietatis  =>  varietas varietat
--  Ex: radix radicis  =>  radix  radic     

--  Third declension  N nouns with stems ending in a consonant - N 3 2 
--  Ex: nomen nomenis  =>  nomen nomen
--  Ex: iter itineris =>  iter itiner
--  Ex: tempus temporis  =>  tempus  tempor

--  Third declension nouns  I-stems (M + F)     - N 3 3 
--  Ex: hostis hostis  =>  hostis host 
--  Ex: finis finis  =>  finis fin
--  Consonant i-stems
--  Ex: urbs urbis  =>  urbs urb         
--  Ex: mons montis  =>  mons mont
--  Also use this for present participles (-ns) used as substantives in M + F

--  Third declension nouns  I-stems (N)    - N 3 4 
--  Ex: mare amris  =>  mare mar                       --  ending in "e"
--  Ex: animal animalis  =>  animal animal             --  ending in "al"
--  Ex: exemplar exemplaris  =>  exemplar exemplar     --  ending in "ar"
--  Also use this for present participles (-ns) used as substantives in N     


--  Fourth declension nouns M + F in "us"  - N 4 1 
--  Ex: passus passus  =>  pass pass
--  Ex: manus manus  =>  man man

--  Fourth declension nouns N in "u"  - N 4 2 
--  Ex: genu genus  =>  gen gen
--  Ex: cornu cornus  =>  corn corn


--  All fifth declension nouns  - N 5 1 
--  Ex: dies diei  =>  di di
--  Ex: res rei  =>  r r



--  Adjectives will mostly only be POS and have only the first two stems
--  ADJ X have four stems, zzz stands for any unknown/non-existent stem

--  Adjectives of first and second declension (-us in NOM S M)  - ADJ 1 1 
--  Two stems for POS, third is for COMP, fourth for SUPER
--  Ex: malus mala malum  => mal mal pei pessi 
--  Ex: altus alta altum  => alt alt alti altissi

--  Adjectives of first and second declension (-er) - ADJ 1 2 
--  Ex: miser misera miserum  =>  miser miser miseri miserri
--  Ex: sacer sacra sacrum  =>  sacer sacr zzz  sacerri     --  no COMP
--  Ex: pulcher pulchri  =>  pulcher pulchr pulchri pulcherri


--  Adjectives of third declension - one ending  - ADJ 3 1 
--  Ex: audax (gen) audacis  =>  audax audac audaci audacissi
--  Ex: prudens prudentis  =>  prudens prudent prudenti prudentissi

--  Adjectives of third declension - two endings   - ADJ 3 2 
--  Ex: brevis breve  =>  brev brev brevi brevissi
--  Ex: facil facil   =>  facil facil facili facilli

--  Adjectives of third declension - three endings  - ADJ 3 3 
--  Ex: celer celeris  celere  =>  celer celer celeri celerri
--  Ex: acer acris acre  =>  acer acr acri acerri



--  Verbs are mostly TRANS or INTRANS, but X works fine
--  Depondent verbs must have DEP
--  Verbs have four stems
--  The first stem is the first principal part (dictionary entry) - less 'o'
--  For 2nd decl, the 'e' is omitted, for 3rd decl i-stem, the 'i' is included
--  Third principal part always ends in 'i', this is omitted in stem
--  Fourth part in dictionary ends in -us (or -um), this is omitted
--  DEP verbs omit (have zzz) the third stem


--  Verbs of the first conjugation  --  V 1 1
--  Ex: voco vocare vocavi vocatus  =>  voc voc vocav vocat
--  Ex: porto portave portavi portatus  =>  port port portav portat


--  Verbs of the second conjugation   -  V 2 1
--  The characteristic 'e' is in the inflection, not carried in the stem
--  Ex:  moneo monere monui monitum  =>  mon mon monu monit
--  Ex:  habeo habere habui habitus  =>  hab hab habu habit
--  Ex:  deleo delere delevi deletus  =>  del del delev delet
--  Ex:  iubeo iubere iussi iussus  =>   iub iub iuss iuss
--  Ex:  video videre vidi visus  =>  vid vid vid vis


--  Verbs of the third conjugation, variant 1  - V 3 1  
--  Ex: rego regere rexi rectum  =>  reg reg rex rect
--  Ex: pono ponere posui positus  =>  pon pon posu posit
--  Ex: capio capere cepi captus  => capi cap cep capt   --  I-stem too w/KEY


--  Verbs of the fourth conjugation are coded as a variant of third - V 3 4
--  Ex: audio audire audivi auditus  =>  audi aud audiv audit


--  Verbs like to be - coded as V 5 1   
--  Ex: sum esse fui futurus  =>  s . fu fut
--  Ex: adsum adesse adfui adfuturus  =>  ads ad adfu adfut



UNIQUES

There are a few Latin words that cannot be represented with the scheme
of stems and endings used by the program.  For these very few cases, 
the program invokes a unique procedure.  The file UNIQUES.  contains a
list of such words and is read in at the loading of the program.  This
is a simple ASCII text file which the user can augment.  It is 
expected that there will be very few occasions to do so, indeed, the 
tendency has been that better processing has allowed uniques to be 
removed.  If a user finds an important word that should be included, 
please communicate that to the author.  

The UNIQUES record is essentially the form as one might have it in 
output if the word was processed normally.  In addition there are some
additional fields that the program presently expects.  While these 
could be eliminated, it is convenient for the program not to make the 
UNIQUES a special case.  So a noun form 


N 3 1 ACC S F T  

is followed by two zeros and an X


N 3 1 ACC S F T  0 0                            X        X  X  X  B  O

and then the "five X's" or, more properly, the dictionary codes.  


N 3 1 ACC S F T  0 0                            X        X  X  X  B  O



These pro forma codes are absolutely necessary, but have no further 
impact.  

The program is written in Ada and uses Ada techniques.  Ada is 
designed for high reliability systems (there is no claim the WORDS was
developed with all the other safeguards that that implies!) as a 
consequence is unforgiving.  The exact form is required.  If you want 
to be sloppy you have to deliberately program that in.  

The following examples, and an examination of the UNIQUES.LAT file, 
should allow the user to insert any unique necessary.  


requiem
N 3 1 ACC S F T  0 0                            X        X  X  X  B  O
rest (from labor), respite; intermission, pause, break; amusement, hobby;
bobus              
N 3 1 DAT P C T  0 0                            X        X  X  X  C  X
ox, bull; cow; cattle (pl.)
quicquid
PRON 1 6 NOM S N INDEF   0 0                    X        X  X  X  B  X
whatever, whatsoever; everything which; each one; each; everything; anything
mavis
V     6 2 PRES  ACTIVE  IND  2 S X 0 0          X        X  X  X  B  X
prefer
cette   
V    3 1 PRES ACTIVE IMP  2 P TRANS    0 0      X        X  X  X  B  O
give/bring here!/hand over, come (now/here); tell/show us, out with it! behold!






----------------------------------------------------------------------


                       DEVELOPERS AND REHOSTING                       




This page points to the Ada source code for a Latin-to-English 
dictionary program for the PC, and the dictionary material sufficient 
to re-host.  
Instructions below.  

The WORDS System

Rehosting rehosting package.  There is a WORDSALL.ZIP zip of all the 
Ada source files for WORDS, and support programs and data to generate 
the necessary dictionaries and inflections for re-hosting the WORDS 
Latin-to-English word parsing/translation system on any machine with 
an Ada 95 compiler.  (It can be made to work with Ada 83 also by 
replacing on routine.) 

This a console program (keyboard entry), without fancy Windows GUI, 
and is thereby system independent.  
----------------------------------------------------------------------


WORDSALL contains the Ada source files for WORDS 


strings_package.ads
strings_package.adb
latin_file_names.ads
latin_file_names.adb
config.ads
preface.ads
word_parameters.ads
developer_parameters.ads
preface.adb
put_stat.adb
word_parameters.adb
inflections_package.adb
inflections_package.ads
dictionary_package.ads
dictionary_package.adb
addons_package.ads
addons_package.adb
uniques_package.ads
word_support_package.ads
latin_debug.ads
word_support_package.adb
latin_debug.adb
word_package.ads
line_stuff.ads
line_stuff.adb
developer_parameters.adb
tricks_package.ads
word_package.adb
tricks_package.adb
list_package.ads
list_sweep.adb
dictionary_form.adb
put_example_line.adb
list_package.adb
parse.adb
words.adb



three supporting programs


makedict.adb
makestem.adb
makeinfl.adb



and DOS ASCII data files for them to act upon to produce WORDS data 
files


DICTLINE.GEN
STEMDICT.GEN
INFLECTS.LAT



the other WORDS DOS ASCII supporting files


ADDONS.LAT
UNIQUES.LAT



----------------------------------------------------------------------


The process is to download the WORDSALL.ZIP and unzip into a suitable 
subdirectory.  (If the zip form is unsuitable for your system, I can 
provide the files in an uncompressed form.) The wordy file names are 
for compliance with the restrictions of the GNAT system.  They may be 
renamed, and I can provide an alternative.  However, the long file 
names demand an UNZIP that preserves them, if GNAT is to be used.  For
example, in a GNAT environment (one would maximally optimize the main 
program): 


gnatmake -O3 words
gnatmake makedict
gnatmake makestem
gnatmake makeinfl



This produces executables for WORDS, MAKEDICT, MAKESTEM, and MAKEINFL.
Executing the latter three against the input respectively of 


DICTLINE.GEN
STEMLIST.GEN
INFLECTS.LAT 



(when they ask for DICTIONARY say G) produces 


DICTFILE.GEN
STEMFILE.GEN
INDXFILE.GEN
INFLECTS.SEC



Along with ADDONS.LAT and UNIQUES.LAT, this is the set of data for 
WORDS.  

The only major problem that has appeared on porting so far is that one
must be careful of file names.  Note that the data files are 
capitalized, source files are not.  
----------------------------------------------------------------------


For comments mail to whitaker@erols.com




