Author
Abstract
Natural language seems to contain various special‐purpose subsystems, e.g., personal titles, personal names, dates, street addresses, place names—each with its own structure which relative to the total structure of language is rather simple. An ability to identify automatically words and word strings belonging to various special‐purpose linguistic subsystems (akin to some thesaurus classes) may prove to be very useful since they play an important role in the making of indexes and in various systems for extracting and distributing information. This article describes some of the main problems involved in automatic identification in newspaper texts of words and word strings belonging to two important linguistic subsystems, viz., personal titles and names; lists some of the major rules of an algorithm designed to perform this task; presents statistics concerning the algorithm's accuracy and exhaustiveness obtained in manual application of the algorithm to texts; and suggests some applications for computer programs capable of recognizing personal titles and names. The results obtained indicate that an automatic system capable of accurate and exhaustive identification of personal titles and names in texts requires recognition procedures which are rather complex. It is therefore suggested that along with researching and developing methods for high‐quality automatic classification of words in texts, it may be advisable to set up efficient procedures for manual classification and tagging of words in texts, and automatic extraction of data from texts which were recognized either manually or automatically. Such action seems appropriate since automatic extraction of information from manually recognized texts would probably constitute a valuable service, and, when automatic procedures for identifying dates, personal names, personal titles, trade names, company names, chemical formulas, numbers and measure words, and so forth become competitive with manual ones, the data‐processing profession will be already in possession of operational computer programs capable of extracting data from recognized exts.
Suggested Citation
Casimir Borkowski, 1967.
"An experimental system for automatic identification of personal names and personal titles in newspaper texts,"
American Documentation, Wiley Blackwell, vol. 18(3), pages 131-138, July.
Handle:
RePEc:bla:amedoc:v:18:y:1967:i:3:p:131-138
DOI: 10.1002/asi.5090180305
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:amedoc:v:18:y:1967:i:3:p:131-138. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.