Structured Environment - Computer and Information Science - Lecture Slides, Slides of Applications of Computer Sciences

These lecture slides of the computer and the information sciences are very useful. The important points in these slides are:Structured Environment, Dynamic Element Retrieval, Vector Space Model, Flexible Retrieval System, Document Tree, Ranking of Elements, Method Used for Retrieval, Extensible Markup Language, Term Weighting, Saltonโ€™s Vector

Typology: Slides

2012/2013

Uploaded on 04/24/2013

bandhula
bandhula ๐Ÿ‡ฎ๐Ÿ‡ณ

4.7

(10)

91 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dynamic Element Retrieval IN A
Structured Environment
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Structured Environment - Computer and Information Science - Lecture Slides and more Slides Applications of Computer Sciences in PDF only on Docsity!

Dynamic Element Retrieval IN A

Structured Environment

CONTENTS

Introduction

Retrieval Environment

  • The Vector Space Model
  • INEX Environment
  • Flexible Retrieval System

Method Used for Retrieval

  • Document Tree โ€“ Construction
  • Ranking of Elements
  • Output

Experiments

Conclusions

RETRIEVAL ENVIRONMENT

  • 2 Factors โ€“ Issues when focus moves from documents to components and Saltonโ€™s Vector Space Model
  • Vector Space Model โ€“ Weight number of times a term occurs in the document
  • Foxโ€™s Extended Vector Space Model โ€“ Incorporation of objective identifiers
  • Document vector consists of subvectors
  • Contain text independently indexed, weighted, searched and retrieved
  • Term Weighting โ€“ weighting within subjective vectors
  • Smart Experimental Retrieval System

INEX ENVIRONMENT

  • Content Only (CO) โ€“ ignore document structure, like typical queries, specify only content of search
  • Content and Structure (CAS) โ€“ explicitly refer to structure, exhaustive and specific
  • CO query directly to user, CAS additional filtering and search of body portion
  • CAS returns rank ordered list of elements
  • INEX-EVAL โ€“ uses measures of recall and precision

( fig, exhaustivity, specificity mapped to a single relevance) results are ranked

METHOD FOR FLEXIBLE RETRIEVAL

  • Input โ€“ Query Q given and paragraph, retrieve rank ordered list, terminal modes
  • N top ranked paragraphs as input selected
  • Set of paragraphs used to identify documents โ€“ elements generated and returned as output
  • Document Tree โ€“ Needs information of structure

Terminal nodes Pre-order traversal Terminal nodes found in paragraph index

CONSTRUCTION OF DOCUMENT TREE

  • For query Q, n top ranked paras used to build trees
  • Leaf elements or terminal nodes - paragraph nodes
  • Each leaf represented by term-freq weighted frequency vector
  • 1 st^ โ€“ gather all leaf nodes, terminal nodes done
  • 2 nd^ โ€“ merge children vectors for parents
  • Document schema determine merging
  • Parent โ€“ unique terms of children, term โ€“ freq weighted parent vector( has content of children)
  • Process in recursive manner done
  • Ltu weighting โ€“ N collection size, nk no of elements

((1+log(term_freq))/log(N/nk))/ ((1-slope)+slope*(no_unique_terms)/pivot))

  • N,nk element dependent, should be known through indexing
  • We move up; N โ€“ count elements of each type
  • Nk โ€“ inverted file entry in paragraph index, mapping identifiers and xpaths (given)

OUTPUT OF FLEXIBLE RETRIEVAL

  • Select another leaf node, gather siblings, construct document tree, calculate Lnu term weights, Ltu weighted query; produce another rank ordered list
  • After n top ranked exhausted, last list produced, merge lists
  • Single set of elements rank ordered โ€“ correlation Q
  • Comparison โ€“ flexible retrieval & all-element index

identical โ€“ set of n paragraphs i/p to flexible retrieval have all paragraphs same values used for Lnu-ltu

FACTORS OF INTEREST

  • Slope, pivot for Lnu-ltu
  • Effective structure retrieval
  • Can be determined โ€“ empirically, applied from one collection to other; Generic
  • N- no of paragraphs input, sets upper bound on number per query
  • Actual trees depend on number of paragraphs having same group or same document

EXPERIMENTS DONE

  • All-element and dynamic/flexible retrieval experiments and results
    • body-only retrieval
  • Correlation between element and query vector produced โ€“ correlation of body elements only

Table 1

DISCUSSIONS AND CONCLUSIONS

  • Flexible retrieval dynamically, rank ordered list of elements, single indexing at level - basic indexing node (paragraph)
  • Basic functions- SMART; extended vector model
  • Results โ€“ flexible capabilities
  • Attempt to incorporate other subvectors, internal node, weight
  • INEX โ€“ exhaustivity and specificity; results exhaustive; specificity research going on; results are reflection
  • It is the better way of retrieval than all-indexing