We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.
SIGIR 2013: 193-202
Faster and smaller inverted indices with treaps.
Roberto Konow, Gonzalo Navarro, Charles L. A. Clarke, Alejandro López-Ortiz