💾

Databases

Database engines, storage systems, and query optimization

Repositories

(11)

apache/cassandra

1 paper

ClickHouse/ClickHouse

11 papers

cockroachdb/cockroach

8 papers

dgraph-io/dgraph

0 papers

duckdb/duckdb

8 papers

facebook/rocksdb

3 papers

mongodb/mongo

3 papers

postgres/postgres

4 papers

redis/redis

2 papers

sqlite/sqlite

2 papers

yugabyte/yugabyte-db

6 papers

Papers

(41)
Showing 20 of 41 papers

A contextual-bandit approach to personalized news article recommendation

Lihong Li, Wei Chu, John Langford, Robert E. Schapire
2010
1 reference

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two re...

A cool and practical alternative to traditional hash tables

Ú. Erlingsson, M. Manasse, Frank McSherry
2006
4 references

Algorithms for geodesics

Charles F. F. Karney
2011
3 references

Algorithms for the computation of geodesics on an ellipsoid of revolution are given. These provide accurate, robust, and fast solutions to the direct and inverse geodesic problems and they allow differential and integral properties of geodesics to be...

A Study of Replacement Algorithms for Virtual-Storage Computer

L. A. Belady
1966
1 reference

One of the basic limitations of a digital computer is the size of its available memory. 1 In most cases, it is neither feasible nor economical for a user to insist that every problem program fit into memory. The number of words of information in a pr...

A Tutorial on Thompson Sampling

Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
2017
1 reference

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may ...

Cuckoo hashing.

Rasmus Pagh, Flemming Friche Rodler
2004
2 references

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.

Yu. A. Malkov, D. A. Yashunin
2020
4 references

We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search str...

Elle: Inferring Isolation Anomalies from Experimental Observations

Kyle Kingsbury, Peter Alvaro
2020
1 reference

Users who care about their data store it in databases, which (at least in principle) guarantee some form of transactional isolation. However, experience shows [Kleppmann 2019, Kingsbury and Patella 2019a] that many databases do not provide the isolat...

Less hashing, same performance: Building a better Bloom filter

Adam Kirsch, Michael Mitzenmacher
2006
2 references

Abstract A standard technique from the hashing literature is to use two hash functions h 1 ( x ) and h 2 ( x ) to simulate additional hash functions of the form g i ( x ) = h 1 ( x ) + i h 2 ( x ). We demonstrate that this technique can be usefully a...

Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time

Kanat Tangwongsan, Martin Hirzel, Scott Schneider
2017
1 reference

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be cast as binary operators that are associative, but they are not necessarily commut...

New cardinality estimation algorithms for HyperLogLog sketches

Otmar Ertl
2017
10 citations
5 references

This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based ...

Novel Table Lookup-Based Algorithms for High-Performance CRC Generation

Michael E. Kounavis, Frank L. Berry
2008
2 references

A framework for designing a family of novel fast cyclic redundancy code (CRC) generation algorithms is presented. Our algorithms can ideally read arbitrarily large amounts of data at a time, while optimizing their memory requirement to meet the const...

On the correct and complete enumeration of the core search space

Guido Moerkotte, Pit Fender, Marius Eich
2013
1 reference

Reordering more than traditional joins (e.g. outerjoins, antijoins) requires some care, since not all reorderings are valid. To prevent invalid plans, two approaches have been described in the literature. We show that both approaches still produce in...

RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search

Jianyang Gao, Cheng Long
2024
1 reference

Searching for approximate nearest neighbors (ANN) in the high-dimensional Euclidean space is a pivotal problem. Recently, with the help of fast SIMD-based implementations, Product Quantization (PQ) and its variants can often efficiently and accuratel...

R-trees: a dynamic index structure for spatial searching

A. Guttman
1984
2 references

In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, tradit...

Serializable isolation for snapshot databases

Michael J. Cahill, Uwe Röhm, Alan Fekete
2009
2 references

Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that...

The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.

Norbert Beckmann, Hans‐Peter Kriegel, Ralf Schneider, Bernhard Seeger
1990
2 references

The R-tree, one of the most popular access methods for rectangles, is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. By running numerous experiments in a standardized testbed under highly varying data, ...

The serializability of concurrent database updates

Christos H. Papadimitriou
1979
1 reference

ABSTRACT A sequence of interleaved user transactions in a database system may not be ser:ahzable, t e, equivalent to some sequential execution of the individual transactions Using a simple transaction model, it ~s shown that recognizing the transacti...

Weighted Random Sampling over Data Streams

Pavlos S. Efraimidis
2010
2 references

In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]), discuss samplin...