Inverted File Index
Definition
Index is a mechanism for locating a given term in a text.
Inverted file contains a list of pointers, Inverted because it lists for a term
Modules
- Token analyzer + stop filter
- Word Stemming
- Stop Words
- index
- search tree
- hash table
- distributed index
- Dynamic indexing
- Docs come in / deleted
- Thresholding
- only retrieve the top x documents
Measures
- How fast does it index
- How fast does it search
- Expressiveness of query language
Data retrieval performance
- index space
- response time
Relevance measurement requires 3 elements
- A benchmark document collection
- A benchmark suite of queries
-
A binary assessment of either relevent or irrelevant for each query-doc pair
relevant Irrelevant Retrived \(R_R\) \(I_R\) Not Retrived \(R_N\) \(I_N\) - Precision \(P =R_R/(R_R+I_R)\)
- Recall \(R = R_R/(R_R+R_N)\)