Author
Listed:
- Christopher C. Yang
- Nan Liu
Abstract
Navigating through hyperlinks within a Web site to look for information from one of its Web pages without the support of a site map can be inefficient and ineffective. Although the content of a Web site is usually organized with an inherent structure like a topic hierarchy, which is a directed tree rooted at a Web site's homepage whose vertices and edges correspond to Web pages and hyperlinks, such a topic hierarchy is not always available to the user. In this work, we studied the problem of automatic generation of Web sites' topic hierarchies. We modeled a Web site's link structure as a weighted directed graph and proposed methods for estimating edge weights based on eight types of features and three learning algorithms, namely decision trees, naïve Bayes classifiers, and logistic regression. Three graph algorithms, namely breadth‐first search, shortest‐path search, and directed minimum‐spanning tree, were adapted to generate the topic hierarchy based on the graph model. We have tested the model and algorithms on real Web sites. It is found that the directed minimum‐spanning tree algorithm with the decision tree as the weight learning algorithm achieves the highest performance with an average accuracy of 91.9%.
Suggested Citation
Christopher C. Yang & Nan Liu, 2009.
"Web site topic‐hierarchy generation based on link structure,"
Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(3), pages 495-508, March.
Handle:
RePEc:bla:jamist:v:60:y:2009:i:3:p:495-508
DOI: 10.1002/asi.20990
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamist:v:60:y:2009:i:3:p:495-508. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.