








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These lecture slides of the computer and the information sciences are very useful. The important points in these slides are:Reducing Human Interactions, Web Directory Searches, Handle Information-Finding, Optimal Hotlinks, Hierarchical Approach, Index Tree, Set of Hotlinks, Library Index Systems, Root of Tree, Greedy Strategy
Typology: Slides
1 / 14
This page cannot be seen from the preview
Don't miss anything!









Find information in a large collection of data. There are two basic ways to handle information-finding. One is a ―flat‖ approach which views the information as a nonhierarchical structure and provides a query language to extract the relevant data from the database. An example of this approach on the web is the Google search engine. The other method is based on a hierarchical index to the database according to a taxonomy of categories. Example of such index on the web is Yahoo.
A partial solution to this problem is currently used in the web, and consists of a list of ―hot‖ pointers which appears in the top level of the index tree and leads directly to the most popular items. We refer to a link from a hotlist to its destination as a hotlink. This approach is not scalable in the sense that only a small number of items can appear in such a list In the current article we study the generalization of this ―hotlist‖ This approach allows us to have such lists in multiple levels in the index tree, not just the top level. The resulting structure is termed a hotlink-enhanced index structure (or enhanced structure for short). This article also addresses the optimization problem faced by the index designer, namely, to find a set of hotlinks that minimizes the expected number of links (either tree edges or hotlinks) traversed by a user from the root to a leaf. Docsity.com
There are many applications for such hotlink-enhanced index structures. A partial list includes: —a web index (such as Yahoo), with multilevel hotlists in which the access statistics is influenced by accesses of all users to various sites; —a personalized web index in which the browser records personalized statistics; —large library index systems. —Application menu trees are currently designed by the application developer, or statically customized to the needs of a user. By adding hotlists, the application can learn the usage pattern of the user and adjust to changes in this pattern. —It is even possible to use the preceding idea in file systems, in which files in the static tree structure of the file system can be augmented by links to frequently accessed subdirectories or files.
Given a tree T with n nodes representing an index, a hotlink is an edge that does not belong to the tree. The hotlink starts at some node v and ends at (or leads to) some node u that is a descendant of v. We assume without loss of generality that u is not a child of v. Each leaf x of T has a weight p(x), representing the proportion of the user’s visits to that leaf compared with the total set of user’s visits. Hence, if normalized, p(x) can be interpreted as the probability that a user wants to access the leaf x. Another parameter of the problem is an integer K, specifying an upper bound on the number of hotlinks that may start at any given node (there is no a priori limiton the number of hotlinks that lead to a given node).
Let S be a set of hotlinks constructed on the tree (obeying the bound of K outgoing hotlinks per node) and let DS(v) denote the greedy path (including hotlinks) from the root to node v. The expected number of operations needed to get to an item is The problem of optimizing this function is referred to as the hotlink enhancement problem. Two different static problems arise, according to whether the probability distribution p is known to us in advance. Assuming a known distribution, our goal is to find a set of hotlinks S which minimizes f (T, p, S) and achieves the optimal cost Such a set is termed an optimal set of hotlinks. Docsity.com
The algorithm uses dynamic programming and the greedy assumption to limit the search operations. The authors also show how to generalize the solution to arbitrary degree trees and to hotlink enhancement schemes that allow up to K hotlinks per node. The algorithm can be used for trees with unbounded degree. For an input tree T with n nodes, our algorithm runs in time, requiring space. Thus, it runs in polytime for trees of logarithmic depth.
The motivation is twofold:
Hotlink Assignment to Actual Websites with Zipf’s Distribution. A critical aspect of the algorithm is the use of memory for the dynamic programming table. With 23MB, the algorithm provided the optimal solution of 82 out of 84 instances. The 2 hard instances (not solved with 23MB) were those with the greatest heights. One of the hard instances has 10,484 nodes and height 16, requiring 488MB. The other instance has 512,484 nodes and height 14, requiring more than 1GB.
We have presented new exact algorithms for several variations of the problem of assigning hotlinks to hierarchically arranged data such as in web directories. For most of these variations, we have proved that the proposed algorithms run efficiently from a theoretical point of view. In the case of one of the algorithms, the running time is polynomial if the depth of the tree is logarithmic. We have run some experiments to evaluate both the efficiency and efficacy of the algorithm that solves the problem for known distributions and at most one hotlink leaving each node. These experiments show that significant improvement in the expected number of accesses per search can be achieved in websites using this algorithm. In addition, the proposed algorithm consumed a reasonable amount of computational resources to obtain optimum hotlink assignments.