yugabyte/yugabyte-db
Papers Referenced in This Repository
Faster-Than-Native Alternatives for x86 VP2INTERSECT Instructions
We present faster-than-native alternatives for the full AVX512-VP2INTERSECT instruction subset using basic AVX512F instructions. These alternatives compute only one of the output masks, which is sufficient for the typical case of computing the intersection of two sorted lists of integers, or computi...
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.
We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures (typically used at the coarse search stage...
Less hashing, same performance: Building a better Bloom filter
Abstract A standard technique from the hashing literature is to use two hash functions h 1 ( x ) and h 2 ( x ) to simulate additional hash functions of the form g i ( x ) = h 1 ( x ) + i h 2 ( x ). We demonstrate that this technique can be usefully applied to Bloom filters and related data structure...
Serializable isolation for snapshot databases
Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by int...
Novel Table Lookup-Based Algorithms for High-Performance CRC Generation
A framework for designing a family of novel fast cyclic redundancy code (CRC) generation algorithms is presented. Our algorithms can ideally read arbitrarily large amounts of data at a time, while optimizing their memory requirement to meet the constraints of specific computer architectures. In addi...
PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension
Similarity search has been widely used in various fields, particularly in the Alibaba ecosystem. The open-source solutions to a similarity search of vectors can only support a query with a single vector, whereas real-life scenarios generally require a processing of compound queries. Moreover, existi...