duckdb/duckdb
Papers Referenced in This Repository
New cardinality estimation algorithms for HyperLogLog sketches
This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based on the maximum likelihood principle a second unbia...
The LDBC Social Network Benchmark
The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC ...
A Parallel Space Saving Algorithm For Frequent Items and the Hurwitz zeta distribution
We present a message-passing based parallel version of the Space Saving algorithm designed to solve the $k$--majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is therefore useful for iceberg queries and many othe...
Note on a Method for Calculating Corrected Sums of Squares and Products
"Note on a Method for Calculating Corrected Sums of Squares and Products." Technometrics, 4(3), pp. 419–420
Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window Functions
Window functions became part of the SQL standard in SQL:2003 and are widely used for data analytics: Percentiles, rankings, moving averages, running sums and local maxima are all expressed as window functions in SQL. Yet, the features offered by SQL's window functions lack composability: Framing is ...
Dynamic programming strikes back
Two highly efficient algorithms are known for optimally ordering joins while\navoiding cross products:\nDPccp, which is based on dynamic programming, and Top-Down Partition Search, \nbased on memoization.\nBoth have two severe limitations:\nThey handle only (1) simple (binary) join predicates and (2...