TERMINOLOGY:

Throughout New Yanoff and its documentation we use the following
conventions:  "article" refers to a Usenet/NNTP message, "email" refers
to an email/SMTP message, "message" is generic including both/either
NNTP and/or SMTP messages, "Yanoff" refers to all versions of Yanoff,
"GPL Yanoff" refers to the original open-source Yanoff, "Yanoff-" refers
to the totally demoware (free) version of the private-branch (our)
Yanoff, "Yanoff+" refers to the trialware (pay-to-keep) version of the
private-branch (our) Yanoff, and "New Yanoff" refers to both "Yanoff-"
and "Yanoff+" We also abbreviate the word "Newsgroup" as "NG" and the
word "Message-ID" as "MID".

HISTORY:

In mid-June 1999, pioneer Matthias Jordan released a mostly-complete,
relatively-stable, perfectly useable, open-source (under GPL) newsreader
called Yanoff with the hopes that others would carry on his work and
mature the app.  Unfortunately, that never really happened (the great
exception being the wonderful Conduit that was created by Jan-Pascal van
Best but that was an original work, not an expansion of Yanoff itself).
Several years passed and while Yanoff's popularity increased, it never
spurred any further development.  A "power user" decided he would jump
in and fix some of the things that bugged him.  After much discussion
and a brief experiment with "Ransom-Ware" as a method to generate
revenue from his improvements, all parties agreed to a licensing plan
that would benefit everyone.

THE DEAL:

In exchange for an unconditional license of the single-point-in-time
fork of the GPL source code, we agreed to pay royalty to the Free
Software Foundation (FSF) on behalf of Matthias.  In addition we agreed
to pass along bug reports (and fixes) to Matthias so that anyone
interested in fixing the GPL release might do so.  Beyond the contracted
obligations, we have decided to release two updated versions.  One,
Yanoff+, is a super-powered update with all of the known bugs of GPL
(original) Yanoff fixed.  Yanoff+ is "Trialware" in that after 15 days,
it disables many features until it is unlocked by installing a license
key.  The other, Yanoff-, is TOTALLY free and includes most (if not all)
of the features of GPL Yanoff plus all the bug fixes and many of the
features of Yanoff+.  However the most powerful and tempting features
are not present so as to provide incentive for people to purchase
license for Yanoff+ (a complete list of differences is in the FAQ).

This way everybody wins.  The GPL version will live on and anybody who
wants to jump in and fix it up still can.  Meanwhile EVERYBODY has a
bug-fixed, feature-enhanced version (Yanoff-) that is totally FREE.
Those people who want all the bells and whistles have the opportunity to
buy the super-enhanced version (Yanoff+).  The FSF gets some revenue
which benefits everyone (they do GREAT work and we use their tools).
And lastly we, the developers get compensation for the work and cost of
adding the increased value to the product.

Yanoff+ may be used for 15 days with all but a small handful of features
disabled (those are only for registered users; see the FAQ) at which
point it reverts to the same functionality as Yanoff-.  If you are new
to Yanoff, we suggest you start with Yanoff- and, once you are familiar
with how it works, try out the expanded features of Yanoff+.  That way
you won't waste any of your 15-day trial period on the initial basic
learning curve.

HOW IT WORKS:

Here's how all versions of  Yanoff work....

(Re)Defining Servers:

At least 1 SMTP (email) server is (re)defined by the user at server #0.
At least 1 NNTP (Usenet) server is also defined.  There are 2 "dummy"
servers defined the first time Yanoff is run and the user must modify
these for his servers.  Then, using some other method to determine what
newsgroups exist (Outlook Express or groups.google.com are 2 good ways),
the user "Subscribe"s to the newsgroups he desires.

NewsArts.pdb; message storage:

All messages (or portions thereof) are stored in a single message
database (NewsArts.pdb).  This database contains both polled (incoming)
and user-created (outgoing) messages; they are all stored in the same
format and in the same place (with a flag reflecting which type it is).
Each subscribed newsgroup (and each of the !Drafts, !!Outgoing, !!!Sent,
and !!!!Lost newsgroups) has its own private database
(NewsGroup-<#>.pdb) whose every entry contains a reference to an entry
in the article database.  More than 1 newsgroup may reference the same
article (each article has a multi-index counter indicating the number of
newsgroups which reference it).

NewsGroup*.pdb; Newsgroup DBs:

Messages first get saved into the article database (polling or creating)
and a second stage (called indexing) adds entries to the appropriate
newsgroup database(s) referencing the article (and increments the
article's multi-index counter).  When creating messages, these 2 stages
happen immediately together but when polling, first all articles are
polled, then all articles are indexed.

DB Synchronicity:

These 2 (sets of) databases should remain in sync at all times but there
is always potential for them to fall out of sync.  There are 3 types of
asyncronicity:  unidexed messages, unsubscribed articles and bogus
message references.  There are several tools to handle these
possibilities.  The best and most time-consuming is "Re/Index-All".
This destroys all newsgroup databases and recreates them by reindexing
every single article.

One additional benefit of "Re/Index-All" is that it reassigns the
numbers on the end of the "NewsGroup-<#>" DBs so that they match the
order they appear on the first, main screen (the first one in the list
gets #1, and so on).  Initially, they are assigned as each newsgroup was
subscribed (i.e.  !Drafts got #1, !!Outgoing got #2 because they are
automatically crated; the first user-subscribed newsgroup got #3 and so
on).  After a "Re/Index-All", the newsgroup names map directly to the
"NewsGroup-<#>" DBs.  If the order is later changed, ("Rearrange NGs"),
the NewsGroup-<#> numbers do not change so they will again not directly
map.

The "Re/Index-All" is a very long-duration operation and many times, a
shorter, quicker operation will suffice if one is certain one
understands what problems do and do not exist.

Unindexed Messages:

An unindexed message is one which exists in the article database but is
not referenced by any newsgroup databases.  The "Set Next Re/Index"
option looks for a group of messages in the article database which are
not referenced in at least 1 newsgroup database.  It does this by
scanning all newsgroup databases keeping track of the highest message
number referenced.  If this number is less than the number of messages
in the article database, then those messages above that number are,
obviously, not referenced.  Once the "Set Next Re/Index" is run, a
"Re/Index-Unindexed" operation can be initiated to index the unindexed
messages into their newsgroups.  The "search" button on the "Re/Index"
dialog combines these 2 steps into 1.  If unindexed messages are found,
an automatic "Re/Index-Unindexed" will be performed; if not, it will do
nothing.

Unsubscribed Articles:

An unsubscribed article is one which somehow exist in the article
database but whose "Newsgroups:"  header does not contain any subscribed
newsgroups.  In this case the article will be indexed into the
"!!!!Lost" newsgroup.  This special newsgroup is auto-subscribed when it
is needed (if it does not already exist) and auto-unsubscribed when all
articles inside it have been deleted.  In GPL Yanoff, this article will
never appear and be inaccessible with 1 exception.  It will be purged
appropriately by the "Purge Old Articles" function.  This situation can
be created as with the following steps.  Subscribe to a newsgroup and
download some articles.  Abort the index operation (for GPL Yanoff, this
can only be done by resetting the device).  Then unsubscribe the
newsgroup and continue indexing (for GPL Yanoff, this means a
"Re/Index-All").

Bogus Message References:

A bogus message reference is a reference in a newsgroup database to
message which does not exist in the article database.  The "Fix NG
Corruption" operation looks for and eliminates such references.  If a
newsgroup is found to contain any bogus references and it is using
thread caching, this operation will delete the thread cache and flag it
to be automatically recreated.  If the re-cache operation is canceled,
the newsgroup will have it's caching preference turned off.  This must
be done because we have no way to know to which thread the missing
message belonged (we can't check it's Subject nor its Message-ID nor its
References because it no longer exists) so the thread cache is also
corrupt.  A thread cache must be 100% accurate or it is useless and will
result in threading mistakes.

Indexing and Thread Caching:

So what is thread caching?  Well let's start with threading.  Threading
is the operation of ordering messages so that they are in genealogical
order.  Every article has a unique Message-ID.  Furthermore, if it is a
followup to another message, it should also have a References header
which, at a minimum, should identify his root ancestor (the first
message in the thread) and his father.  Armed with genealogical data we
should be able to order messages in a thread so that those which are
responses to previous messages are indexed after their ancestors.
Threading allows groups of related messages to be represented as a
time-sequenced conversation (AKA a "thread").

Obviously threading requires that every message already in a newsgroup
be examined to see if it is related to any newly-arriving
(being-indexed) messages.  A message of a new thread must check every
currently indexed message before figuring out that it is not related to
any of them.  This is the most costly case because all messages must be
examined.  Even if it turns out it is a member of a pre-existing thread,
we don't know where the thread begins so every message must be checked
until we find the first relative.  The more messages that are already
indexed in a newsgroup, the longer these examinations take.  Very
quickly this becomes intolerably slow.

This is where thread caching saves the day.  For a relatively small cost
in RAM, Yanoff+ (not Yanoff-, nor GPL Yanoff) will maintain an sorted
list of threads which currently exist in the newsgroup along with the
position of the bottom-most message and the total number of messages in
each thread.  When a message is indexed, a single search into the sorted
thread cache will tell us not only whether there is a thread (if not,
just add the message at the bottom and add a new thread to the cache)
but, if so, also where the thread begins (or rather, where it ends) and
how long it is.  Thread caching turns an logarithmically deteriorating
threading operation into a nearly linear one.

There are 2 types of thread caching:  Subject and Reference.  The
question really is, "What constitutes a thread?"  The reason this is an
issue is because many users and software do not properly set (or even
deliberately discard) the "References:"  header.  This carelessly
anti-social behavior is actually amazingly common.  If this is the case,
how can one tell whether a message is part of a thread?  One must
compare the "Subject:"  header.  The best threading will result when
both types of caching are used but Subject-only caching is usually very
acceptably accurate (Reference-only caching will probably never do as
good a job but it is an option regardless).

Crossposting and the History DB:

Now that we've broached the topic of threading and Message-IDs, let's
talk about crossposting.  Crossposting is when an article's "Newsgroup:"
header contains more than 1 newsgroup.  If a user is subscribed to 10
newsgroups and a single article has all 10 of them in its "Newsgroup:"
header, it is silly to download (wasting time) or store (wasting RAM) 10
copies of that same article.  Instead what is done is that before an
article is downloaded, its Message-ID checked against an alphabetized
list of previously-downloaded articles stored in another database
(NewsHistory.pdb).  The "Check MIDs" Poll Preference controls whether
this is done.  If the article is not found, it is downloaded and it's
Message-ID is added to the database.  The "Store MIDs" Poll Preference
controls whether this is done.  Each user must decide which is more
wasteful of RAM, duplicating articles or maintaining the history
database.  If RAM is not a concern, then do use the history facility as
it provides time savings both during polling.

Obviously, the history database will grow ever larger unless old entries
are purged.  The "Purge MID" operation does exactly this and MUST be
performed on a regular basis to keep RAM waste to a minimum.  Turning
off both poll preferences mentioned earlier completely disables this
functionality which eliminates the history database but allows the
bloating of the article database (if crossposting is common in
subscribed newsgroups).

An interesting feature of purging is that it will also purge any entries
which (will) "happen" in the future.  So if you find you want to remove
some entries from the history database so that you can go back and
repoll articles recently polled, just set the date on the PDA back to
the appropriate day and purge.  All entries with birthdays later than
"today" will also be purged in addition to those older than the
timeframe specified.  A similar function is the "Repopulate
NewsHistory".  If somehow the entire MID history DB gets purged or
deleted, a partial recreation can be achieved by the use of this
function.  It takes every currently existing article and inserts it's
MID into the history DB.

Note that for conduit users, NONE of this threading/history discussion
applies because those articles arrive on the PDA pre-threaded by the
software on the host computer.  Also note that the host computer does a
nearly perfect job of threading (being that it has, essentially,
infinite battery and CPU power) whereas threading by Yanoff on the PDA
takes several reasonable shortcuts to improve speed but which also
allows some errors.  Yanoff's threading algorithm is very much a
work-in-progress.  Whenever any other process other than New Yanoff
updates the articles (e.g. the conduit), the thread caches become
invalidated and will be automatically deleted.
