Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses

Tobias Schmidt, Andreas Kipf, Dominik Horn, Gaurav Saxena, Tim Kraska
2024
2 references

Abstract

Cloud data warehouses are today's standard for analytical query processing. Multiple cloud vendors offer state-of-the-art systems, such as Amazon Redshift. We have observed that customer workloads experience highly repetitive query patterns, i.e., users and systems frequently send the same queries. In order to improve query performance on these queries, most systems rely on techniques like result caches or materialized views. However, these caches are often stale due to inserts, deletes, or updates that occur between query repetitions. We propose a novel secondary index, predicate caching, to improve query latency for repeating scans and joins. Predicate caching stores ranges of qualifying tuples of base table scans. Such an index can be built on the fly, is lightweight, and can be kept online without recomputation. We implemented a prototype of this idea in the cloud data warehouse Amazon Redshift. Our evaluation shows that predicate caching improves query runtimes by up to 10x on selected queries with negligible build overhead.

1 repository
2 references

Code References

â–¶ ClickHouse/ClickHouse
2 files
â–¶ docs/en/operations/query-condition-cache.md
1
- [Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses (Schmidt et. al., 2024)](https://doi.org/10.1145/3626246.3653395)
â–¶ src/Interpreters/Cache/QueryConditionCache.h
1
/// An implementation of predicate caching a la https://doi.org/10.1145/3626246.3653395
Link copied to clipboard!