💾

Databases

Database engines, storage systems, and query optimization

Repositories

(11)

Papers

(41)

Showing 20 of 41 papers

A Deep Probabilistic Model for Customer Lifetime Value Prediction

Xiaojing Wang, Tianqi Liu, Jingang Miao

2019

1 reference

Accurate predictions of customers' future lifetime value (LTV) given their attributes and past purchase behavior enables a more customer-centric marketing strategy. Marketers can segment customers into various buckets based on the predicted LTV and, ...

View Paper PDF

A Fast, Minimal Memory, Consistent Hash Algorithm

John Lamping, Eric Veach

2014

2 references

We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better...

View Paper PDF

A Parallel Space Saving Algorithm For Frequent Items and the Hurwitz zeta distribution

Massimo Cafaro, Marco Pulimeno, Piergiulio Tempesta

2014

4 references

We present a message-passing based parallel version of the Space Saving algorithm designed to solve the $k$--majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is ...

View Paper PDF DOI

A simple algorithm for finding frequent elements in streams and bags

R. Karp, S. Shenker, C. Papadimitriou

2003

1 reference

We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algo...

View Paper DOI

Computing Extremely Accurate Quantiles Using t-Digests

Ted Dunning, Otmar Ertl

2019

1 reference

We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile $q$ to be computed with an acc...

View Paper PDF

Design and Analysis of a Logless Dynamic Reconfiguration Protocol

William Schultz, Siyuan Zhou, Ian Dardik, Stavros Tripakis

2021

1 reference

Distributed replication systems based on the replicated state machine model have become ubiquitous as the foundation of modern database systems. To ensure availability in the presence of faults, these systems must be able to dynamically replace faile...

View Paper PDF

Dynamic programming strikes back

G. Moerkotte, Thomas Neumann

2008

1 reference

Two highly efficient algorithms are known for optimally ordering joins while\navoiding cross products:\nDPccp, which is based on dynamic programming, and Top-Down Partition Search, \nbased on memoization.\nBoth have two severe limitations:\nThey hand...

View Paper PDF DOI

Efficiency of a Good But Not Linear Set Union Algorithm

Robert E. Tarjan

1972

1 reference

We questioned whether the respiratory muscles of humans contribute to systemic oxidative stress following inspiratory flow-resistive breathing, whether the amount of oxidative stress is influenced by the level of resistive load, and whether the amoun...

View Paper PDF DOI

Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window Functions

Adrian Vogelsgesang, Thomas Neumann, Viktor Leis, A. Kemper

2022

1 reference

Window functions became part of the SQL standard in SQL:2003 and are widely used for data analytics: Percentiles, rankings, moving averages, running sums and local maxima are all expressed as window functions in SQL. Yet, the features offered by SQL'...

View Paper PDF DOI

Faster-Than-Native Alternatives for x86 VP2INTERSECT Instructions

2021

2 references

We present faster-than-native alternatives for the full AVX512-VP2INTERSECT instruction subset using basic AVX512F instructions. These alternatives compute only one of the output masks, which is sufficient for the typical case of computing the inters...

View Paper DOI

Gorilla: A Fast, Scalable, In-Memory Time Series Database

Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin Meza, J. Teller, K. Veeraraghavan

2015

2 references

Large-scale internet services aim to remain highly available and responsive in the presence of unexpected failures. Providing this service often requires monitoring and analyzing tens of millions of measurements per second across a large number of sy...

View Paper DOI

Large-Scale Metric Computation in Online Controlled Experiment Platform

Tao Xiong, Yong Wang

2024

5 references

Online controlled experiment (also called A/B test or experiment) is the most important tool for decision-making at a wide range of data-driven companies like Microsoft, Google, Meta, etc. Metric computation is the core procedure for reaching a concl...

View Paper DOI

Note on a Method for Calculating Corrected Sums of Squares and Products

B. P. Welford

1962

1 reference

"Note on a Method for Calculating Corrected Sums of Squares and Products." Technometrics, 4(3), pp. 419–420

View Paper DOI

One WITH RECURSIVE is Worth Many GOTOs

Denis Hirn, Torsten Grust

2021

1 reference

PL/SQL integrates an imperative statement-by-statement style of programming with the plan-based evaluation of SQL queries. The disparity of both leads to friction at runtime, slowing PL/SQL execution down significantly. This work describes a compiler...

View Paper DOI

PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension

Wen Yang, Tao Li, Gai Fang, Hong Wei

2020

1 reference

Similarity search has been widely used in various fields, particularly in the Alibaba ecosystem. The open-source solutions to a similarity search of vectors can only support a query with a single vector, whereas real-life scenarios generally require ...

View Paper PDF DOI

Practical String Dictionary Compression Using String Dictionary Encoding

S Kanda, Kazuhiro Morita, Masao Fuketa

2017

1 reference

A string dictionary is a data structure for storing a set of strings that maps them to unique IDs. It can manage string data in compact space by encoding them into integers. However, instances have recently emerged in practice where the size of strin...

View Paper DOI

Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses

Tobias Schmidt, Andreas Kipf, Dominik Horn, Gaurav Saxena, Tim Kraska

2024

2 references

Cloud data warehouses are today's standard for analytical query processing. Multiple cloud vendors offer state-of-the-art systems, such as Amazon Redshift. We have observed that customer workloads experience highly repetitive query patterns, i.e., us...

View Paper DOI

Random sampling from hash files

Frank Olken, Doron Rotem, Ping Xu

1990

2 references

In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files ...

View Paper DOI

Reordering columns for smaller indexes.

Daniel Lemire, Owen Kaser

2011

3 references

View Paper PDF DOI

TinyLFU: A Highly Efficient Cache Admission Policy

Gil Einziger, Roy Friedman, Ben Manes

2015

1 reference

This paper proposes to use a frequency based cache admission policy in order to boost the effectiveness of caches subject to skewed access distributions. Given a newly accessed item and an eviction candidate from the cache, our scheme decides, based ...

View Paper PDF

Previous Page 2 of 3 Next

Databases

Repositories

apache/cassandra

ClickHouse/ClickHouse

cockroachdb/cockroach

dgraph-io/dgraph

duckdb/duckdb

facebook/rocksdb

mongodb/mongo

postgres/postgres

redis/redis

sqlite/sqlite

yugabyte/yugabyte-db

Papers