11 papers
19 files
22 references

Papers Referenced in This Repository

Large-Scale Metric Computation in Online Controlled Experiment Platform

Tao Xiong, Yong Wang
2024
5 references

Online controlled experiment (also called A/B test or experiment) is the most important tool for decision-making at a wide range of data-driven companies like Microsoft, Google, Meta, etc. Metric computation is the core procedure for reaching a conclusion during an experiment. With the growth of exp...

Show 5 references in code

A Parallel Space Saving Algorithm For Frequent Items and the Hurwitz zeta distribution

Massimo Cafaro, Marco Pulimeno, Piergiulio Tempesta
2014
4 references

We present a message-passing based parallel version of the Space Saving algorithm designed to solve the $k$--majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is therefore useful for iceberg queries and many othe...

Show 3 references in code

Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses

Tobias Schmidt, Andreas Kipf, Dominik Horn, Gaurav Saxena, Tim Kraska
2024
2 references

Cloud data warehouses are today's standard for analytical query processing. Multiple cloud vendors offer state-of-the-art systems, such as Amazon Redshift. We have observed that customer workloads experience highly repetitive query patterns, i.e., users and systems frequently send the same queries. ...

Show 2 references in code

Gorilla: A Fast, Scalable, In-Memory Time Series Database

Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin Meza, J. Teller, K. Veeraraghavan
2015
2 references

Large-scale internet services aim to remain highly available and responsive in the presence of unexpected failures. Providing this service often requires monitoring and analyzing tens of millions of measurements per second across a large number of systems, and one particularly effective solution is ...

Show 2 references in code

A Fast, Minimal Memory, Consistent Hash Algorithm

John Lamping, Eric Veach
2014
2 references

We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the bu...

Show 2 references in code

A simple algorithm for finding frequent elements in streams and bags

R. Karp, S. Shenker, C. Papadimitriou
2003
1 reference

We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algorithm for finding a majority element, for identify...

Show 1 reference in code

Computing Extremely Accurate Quantiles Using t-Digests

Ted Dunning, Otmar Ertl
2019
1 reference

We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile $q$ to be computed with an accuracy relative to $\max(q, 1-q)$ rather than absol...

Show 1 reference in code

Weighted Random Sampling over Data Streams

Pavlos S. Efraimidis
2010
2 references

In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]), discuss sampling with and without replacement and show adaptation...

Show 1 reference in code

A Deep Probabilistic Model for Customer Lifetime Value Prediction

Xiaojing Wang, Tianqi Liu, Jingang Miao
2019
1 reference

Accurate predictions of customers' future lifetime value (LTV) given their attributes and past purchase behavior enables a more customer-centric marketing strategy. Marketers can segment customers into various buckets based on the predicted LTV and, in turn, customize marketing messages or advertisi...

Show 1 reference in code

Practical String Dictionary Compression Using String Dictionary Encoding

S Kanda, Kazuhiro Morita, Masao Fuketa
2017
1 reference

A string dictionary is a data structure for storing a set of strings that maps them to unique IDs. It can manage string data in compact space by encoding them into integers. However, instances have recently emerged in practice where the size of string dictionaries has become a critical problem for v...

Show 1 reference in code
Link copied to clipboard!