Author
Listed:
- Claudio Todeschini
- Michael P. Farrell
Abstract
An Expert System is presented that can identify errors in the intellectual decisions made by indexers when categorizing documents into an a priori category scheme. The system requires the compilation of a Knowledge Base that incorporates in statistical form the decisions on the linking of indexing and categorization derived from a preceding period of the bibliographic database. New input entering the database is checked against the Knowledge Base, using the descriptor indexing assigned to each record, and the system computes a value for the match of each record with the particular category chosen by the indexer. This category match value is used as a criterion for identifying those documents that have been erroneously categorized. The system was tested on a large sample of almost 26,000 documents, representing all the literature falling into ten of the subject categories of the Energy Data Base during the five year period 1980–1984. The Energy Data Base is a large bibliographic database covering the world's energy‐related literature. For valid comparisons among categories, the Knowledge Base must be constructed with an approximately equal number of unique descriptors for each subject category. The system identified those items with high probability of having been erroneously categorized. These items, constituting up to 5% of the sample, were evaluated manually by subject specialists for correct categorization and then compared with the results of the Expert System. Of those pieces of literature deemed by the system to be erroneously categorized, about 75% did indeed belong to a different category. This percentage, however, is dependent on the level at which the threshold on the category match value is set. With a lower threshold value, the percentage can be raised to 90%, but this is accompanied by a lowering of the absolute number of wrongly categorized records caught by the system. The Expert System can be considered as a first step to a complete semiautomatic categorization system requiring human intervention only in poorly indexed pieces of literature. It is also self‐improving, since in an operational environment the Knowledge Base would be routinely updated, using the most recent period of the database from which erroneously categorized items would have been eliminated by the previous version of the Knowledge Base; hence, each new version will produce better grounds for decision making. © 1989 John Wiley & Sons, Inc.
Suggested Citation
Claudio Todeschini & Michael P. Farrell, 1989.
"An expert system for quality control in bibliographic databases,"
Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 40(1), pages 1-11, January.
Handle:
RePEc:bla:jamest:v:40:y:1989:i:1:p:1-11
DOI: 10.1002/(SICI)1097-4571(198901)40:13.0.CO;2-A
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamest:v:40:y:1989:i:1:p:1-11. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.