IDEAS home Printed from https://ideas.repec.org/p/sek/iacpro/0201861.html
   My bibliography  Save this paper

Automatic Generation of Association Thesaurus Based on Domain-Specific Text Collection

Author

Listed:
  • Aliya Nugumanova

    (East Kazakhstan State Technical University)

  • Dinara Issabaeva

    (Kumash Nurgaliev College)

  • Yerzhan Baiburin

    (East Kazakhstan State Technical University)

Abstract

The given work examines distributive approach for automatic generation of the associative thesauri of a definite domain. Distributive approach is based on assumption that presence of associative link among terms of the domain is defined by the statistics of their co-occurence in thematically related discources. The advantage of distributive approach is defined by the fact that it uses raw basic material (for example collection of documents of the domain) and it does not use additional knowledge about the domain. Distributive approach is supported only by mathematical apparatus of statistics and does not take into account neither lexical nor semantic information, that is why this approach let cover extensive lexical space of terms. However it leads to the main shortcoming of the approach, i.e. it produces excessive amount of ?unnecessary? links among words which are less informative from utilitarian point of view. For solving set problems in the given work it is suggested to use special approach represented by combination of methods of distributive statistics, latent semantic analysis and graph theory.

Suggested Citation

  • Aliya Nugumanova & Dinara Issabaeva & Yerzhan Baiburin, 2014. "Automatic Generation of Association Thesaurus Based on Domain-Specific Text Collection," Proceedings of International Academic Conferences 0201861, International Institute of Social and Economic Sciences.
  • Handle: RePEc:sek:iacpro:0201861
    as

    Download full text from publisher

    File URL: https://iises.net/proceedings/10th-international-academic-conference-vienna/table-of-content/detail?cid=2&iid=68&rid=1861
    File Function: First version, 2014
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    LSA; thesaurus; chi-square test; graph;
    All these keywords.

    JEL classification:

    • C80 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sek:iacpro:0201861. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Klara Cermakova (email available below). General contact details of provider: https://iises.net/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.