💾

Databases

Database engines, storage systems, and query optimization

Repositories

(11)

apache/cassandra

1 paper

ClickHouse/ClickHouse

11 papers

cockroachdb/cockroach

8 papers

dgraph-io/dgraph

0 papers

duckdb/duckdb

8 papers

facebook/rocksdb

3 papers

mongodb/mongo

3 papers

postgres/postgres

4 papers

redis/redis

2 papers

sqlite/sqlite

2 papers

yugabyte/yugabyte-db

6 papers

Papers

(41)
Showing 20 of 41 papers

A Deep Probabilistic Model for Customer Lifetime Value Prediction

Xiaojing Wang, Tianqi Liu, Jingang Miao
2019
1 reference

Accurate predictions of customers' future lifetime value (LTV) given their attributes and past purchase behavior enables a more customer-centric marketing strategy. Marketers can segment customers into various buckets based on the predicted LTV and, ...

A Fast, Minimal Memory, Consistent Hash Algorithm

John Lamping, Eric Veach
2014
2 references

We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better...

A Parallel Space Saving Algorithm For Frequent Items and the Hurwitz zeta distribution

Massimo Cafaro, Marco Pulimeno, Piergiulio Tempesta
2014
4 references

We present a message-passing based parallel version of the Space Saving algorithm designed to solve the $k$--majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is ...

A simple algorithm for finding frequent elements in streams and bags

R. Karp, S. Shenker, C. Papadimitriou
2003
1 reference

We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algo...

Computing Extremely Accurate Quantiles Using t-Digests

Ted Dunning, Otmar Ertl
2019
1 reference

We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile $q$ to be computed with an acc...

Design and Analysis of a Logless Dynamic Reconfiguration Protocol

William Schultz, Siyuan Zhou, Ian Dardik, Stavros Tripakis
2021
1 reference

Distributed replication systems based on the replicated state machine model have become ubiquitous as the foundation of modern database systems. To ensure availability in the presence of faults, these systems must be able to dynamically replace faile...

Dynamic programming strikes back

G. Moerkotte, Thomas Neumann
2008
1 reference

Two highly efficient algorithms are known for optimally ordering joins while\navoiding cross products:\nDPccp, which is based on dynamic programming, and Top-Down Partition Search, \nbased on memoization.\nBoth have two severe limitations:\nThey hand...

Efficiency of a Good But Not Linear Set Union Algorithm

Robert E. Tarjan
1972
1 reference

We questioned whether the respiratory muscles of humans contribute to systemic oxidative stress following inspiratory flow-resistive breathing, whether the amount of oxidative stress is influenced by the level of resistive load, and whether the amoun...

Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window Functions

Adrian Vogelsgesang, Thomas Neumann, Viktor Leis, A. Kemper
2022
1 reference

Window functions became part of the SQL standard in SQL:2003 and are widely used for data analytics: Percentiles, rankings, moving averages, running sums and local maxima are all expressed as window functions in SQL. Yet, the features offered by SQL'...

Faster-Than-Native Alternatives for x86 VP2INTERSECT Instructions

2021
2 references

We present faster-than-native alternatives for the full AVX512-VP2INTERSECT instruction subset using basic AVX512F instructions. These alternatives compute only one of the output masks, which is sufficient for the typical case of computing the inters...

Gorilla: A Fast, Scalable, In-Memory Time Series Database

Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin Meza, J. Teller, K. Veeraraghavan
2015
2 references

Large-scale internet services aim to remain highly available and responsive in the presence of unexpected failures. Providing this service often requires monitoring and analyzing tens of millions of measurements per second across a large number of sy...

Large-Scale Metric Computation in Online Controlled Experiment Platform

Tao Xiong, Yong Wang
2024
5 references

Online controlled experiment (also called A/B test or experiment) is the most important tool for decision-making at a wide range of data-driven companies like Microsoft, Google, Meta, etc. Metric computation is the core procedure for reaching a concl...

Note on a Method for Calculating Corrected Sums of Squares and Products

B. P. Welford
1962
1 reference

"Note on a Method for Calculating Corrected Sums of Squares and Products." Technometrics, 4(3), pp. 419–420

One WITH RECURSIVE is Worth Many GOTOs

Denis Hirn, Torsten Grust
2021
1 reference

PL/SQL integrates an imperative statement-by-statement style of programming with the plan-based evaluation of SQL queries. The disparity of both leads to friction at runtime, slowing PL/SQL execution down significantly. This work describes a compiler...

PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension

Wen Yang, Tao Li, Gai Fang, Hong Wei
2020
1 reference

Similarity search has been widely used in various fields, particularly in the Alibaba ecosystem. The open-source solutions to a similarity search of vectors can only support a query with a single vector, whereas real-life scenarios generally require ...

Practical String Dictionary Compression Using String Dictionary Encoding

S Kanda, Kazuhiro Morita, Masao Fuketa
2017
1 reference

A string dictionary is a data structure for storing a set of strings that maps them to unique IDs. It can manage string data in compact space by encoding them into integers. However, instances have recently emerged in practice where the size of strin...

Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses

Tobias Schmidt, Andreas Kipf, Dominik Horn, Gaurav Saxena, Tim Kraska
2024
2 references

Cloud data warehouses are today's standard for analytical query processing. Multiple cloud vendors offer state-of-the-art systems, such as Amazon Redshift. We have observed that customer workloads experience highly repetitive query patterns, i.e., us...

Random sampling from hash files

Frank Olken, Doron Rotem, Ping Xu
1990
2 references

In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files ...

Reordering columns for smaller indexes.

Daniel Lemire, Owen Kaser
2011
3 references

TinyLFU: A Highly Efficient Cache Admission Policy

Gil Einziger, Roy Friedman, Ben Manes
2015
1 reference

This paper proposes to use a frequency based cache admission policy in order to boost the effectiveness of caches subject to skewed access distributions. Given a newly accessed item and an eviction candidate from the cache, our scheme decides, based ...