Databases
Database engines, storage systems, and query optimization
Repositories
(4)Papers
(11)A contextual-bandit approach to personalized news article recommendation
Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two re...
A Study of Replacement Algorithms for Virtual-Storage Computer
One of the basic limitations of a digital computer is the size of its available memory. 1 In most cases, it is neither feasible nor economical for a user to insist that every problem program fit into memory. The number of words of information in a pr...
A Tutorial on Thompson Sampling
Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may ...
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.
We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search str...
Less hashing, same performance: Building a better Bloom filter
Abstract A standard technique from the hashing literature is to use two hash functions h 1 ( x ) and h 2 ( x ) to simulate additional hash functions of the form g i ( x ) = h 1 ( x ) + i h 2 ( x ). We demonstrate that this technique can be usefully a...
New cardinality estimation algorithms for HyperLogLog sketches
This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based ...
Novel Table Lookup-Based Algorithms for High-Performance CRC Generation
A framework for designing a family of novel fast cyclic redundancy code (CRC) generation algorithms is presented. Our algorithms can ideally read arbitrarily large amounts of data at a time, while optimizing their memory requirement to meet the const...
R-trees: a dynamic index structure for spatial searching
In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, tradit...
Serializable isolation for snapshot databases
Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that...
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.
The R-tree, one of the most popular access methods for rectangles, is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. By running numerous experiments in a standardized testbed under highly varying data, ...