Author
Listed:
- Sándor Dominich
- Tamás Kiezer
Abstract
The vector space model of information retrieval is one of the classical and widely applied retrieval models. Paradoxically, it has been characterized by a discrepancy between its formal framework and implementable form. The underlying concepts of the vector space model are mathematical terms: linear space, vector, and inner product. However, in the vector space model, the mathematical meaning of these concepts is not preserved. They are used as mere computational constructs or metaphors. Thus, the vector space model actually does not follow formally from the mathematical concepts on which it has been claimed to rest. This problem has been recognized for more than two decades, but no proper solution has emerged so far. The present article proposes a solution to this problem. First, the concept of retrieval is defined based on the mathematical measure theory. Then, retrieval is particularized using fuzzy set theory. As a result, the retrieval function is conceived as the cardinality of the intersection of two fuzzy sets. This view makes it possible to build a connection to linear spaces. It is shown that the classical and the generalized vector space models, as well as the latent semantic indexing model, gain a correct formal background with which they are consistent. At the same time it becomes clear that the inner product is not a necessary ingredient of the vector space model, and hence of Information Retrieval (IR). The Principle of Object Invariance is introduced to handle this situation. Moreover, this view makes it possible to consistently formulate new retrieval methods: in linear space with general basis, entropy‐based, and probability‐based. It is also shown that Information Retrieval may be viewed as integral calculus, and thus it gains a very compact and elegant mathematical way of writing. Also, Information Retrieval may thus be conceived as an application of mathematical measure theory.
Suggested Citation
Sándor Dominich & Tamás Kiezer, 2007.
"A measure theoretic approach to information retrieval,"
Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(8), pages 1108-1122, June.
Handle:
RePEc:bla:jamist:v:58:y:2007:i:8:p:1108-1122
DOI: 10.1002/asi.20586
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamist:v:58:y:2007:i:8:p:1108-1122. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.