Manually Indexing Authorities

This document is intended as a last resort when an MIS is not able to index an authority due to network problems. Cornell will make indexer bib records available (on request) for sites that you are having difficulty with, if we are able to get them ourselves.

  1. Get the bibliographic records for the site the MIS in unable to index.

    For special request we will make these available via ftp from Cornell (ftp.cs.cornell.edu in pub/Dienst/bibs).

    Otherwise attempt to connect to the site directly using the following URL as a guide:

    http://site:port/Dienst/Index/2.0/Bibliography
    

    Save the results.

  2. Insert the bib records into your database.

    In the Utilities/bin directory there is a program called 'split-bib-file.pl'. Run this on the single bib file you just retrieved.

    split-bib-file.pl [file] 0 force
    

    The 0 turns off verbose mode and the force option makes sure existing bib records are replaced by those in the new bib file.

  3. Determine how you would like to index the records.
  4. To index only those reports you have added.

    In the Backup_Server [MIS] directory there is a program called 'GetDocids.pl' which will extract the docids from the bibliographic record file.

    GetDocids.pl [file] [docids file] 
    

    This program will write the docids, one per line, into the file you specify [docids file].

    You will feed this list of docids to 'build-inverted-indexes.pl'.

  5. Build the database with the new records.

    If you are rebuilding the entire database type the appropriate 'build-inverted-indexes.pl' command.

    If you are indexing only the new bibliographic records then you will specify the file of docids on the command line to 'build-inverted-indexes.pl'

    build-inverted-indexes.pl -f [docids file] 
    
  6. Update database status.

    The backup server keeps track of when it last checked an authority for new records.

    If you would like the e-mail reports the MIS send to indicate that records have been indexed (instead of printing "No Records Indexed") for the authority you just manually indexed then do the following: