Download Structured Environment - Computer and Information Science - Lecture Slides and more Slides Applications of Computer Sciences in PDF only on Docsity!
Dynamic Element Retrieval IN A
Structured Environment
CONTENTS
Introduction
Retrieval Environment
- The Vector Space Model
- INEX Environment
- Flexible Retrieval System
Method Used for Retrieval
- Document Tree โ Construction
- Ranking of Elements
- Output
Experiments
Conclusions
RETRIEVAL ENVIRONMENT
- 2 Factors โ Issues when focus moves from documents to components and Saltonโs Vector Space Model
- Vector Space Model โ Weight number of times a term occurs in the document
- Foxโs Extended Vector Space Model โ Incorporation of objective identifiers
- Document vector consists of subvectors
- Contain text independently indexed, weighted, searched and retrieved
- Term Weighting โ weighting within subjective vectors
- Smart Experimental Retrieval System
INEX ENVIRONMENT
- Content Only (CO) โ ignore document structure, like typical queries, specify only content of search
- Content and Structure (CAS) โ explicitly refer to structure, exhaustive and specific
- CO query directly to user, CAS additional filtering and search of body portion
- CAS returns rank ordered list of elements
- INEX-EVAL โ uses measures of recall and precision
( fig, exhaustivity, specificity mapped to a single relevance) results are ranked
METHOD FOR FLEXIBLE RETRIEVAL
- Input โ Query Q given and paragraph, retrieve rank ordered list, terminal modes
- N top ranked paragraphs as input selected
- Set of paragraphs used to identify documents โ elements generated and returned as output
- Document Tree โ Needs information of structure
Terminal nodes Pre-order traversal Terminal nodes found in paragraph index
CONSTRUCTION OF DOCUMENT TREE
- For query Q, n top ranked paras used to build trees
- Leaf elements or terminal nodes - paragraph nodes
- Each leaf represented by term-freq weighted frequency vector
- 1 st^ โ gather all leaf nodes, terminal nodes done
- 2 nd^ โ merge children vectors for parents
- Document schema determine merging
- Parent โ unique terms of children, term โ freq weighted parent vector( has content of children)
- Process in recursive manner done
- Ltu weighting โ N collection size, nk no of elements
((1+log(term_freq))/log(N/nk))/ ((1-slope)+slope*(no_unique_terms)/pivot))
- N,nk element dependent, should be known through indexing
- We move up; N โ count elements of each type
- Nk โ inverted file entry in paragraph index, mapping identifiers and xpaths (given)
OUTPUT OF FLEXIBLE RETRIEVAL
- Select another leaf node, gather siblings, construct document tree, calculate Lnu term weights, Ltu weighted query; produce another rank ordered list
- After n top ranked exhausted, last list produced, merge lists
- Single set of elements rank ordered โ correlation Q
- Comparison โ flexible retrieval & all-element index
identical โ set of n paragraphs i/p to flexible retrieval have all paragraphs same values used for Lnu-ltu
FACTORS OF INTEREST
- Slope, pivot for Lnu-ltu
- Effective structure retrieval
- Can be determined โ empirically, applied from one collection to other; Generic
- N- no of paragraphs input, sets upper bound on number per query
- Actual trees depend on number of paragraphs having same group or same document
EXPERIMENTS DONE
- All-element and dynamic/flexible retrieval experiments and results
- Correlation between element and query vector produced โ correlation of body elements only
Table 1
DISCUSSIONS AND CONCLUSIONS
- Flexible retrieval dynamically, rank ordered list of elements, single indexing at level - basic indexing node (paragraph)
- Basic functions- SMART; extended vector model
- Results โ flexible capabilities
- Attempt to incorporate other subvectors, internal node, weight
- INEX โ exhaustivity and specificity; results exhaustive; specificity research going on; results are reflection
- It is the better way of retrieval than all-indexing