A simple algorithm for finding frequent elements in streams and bags

R. Karp, S. Shenker, C. Papadimitriou
2003
1 reference

Abstract

We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algorithm for finding a majority element, for identifying a set of at most 1/θ items that includes, possibly among others, all items with frequency greater than θ.

1 repository
1 reference

Code References

ClickHouse/ClickHouse
1 file
docs/en/sql-reference/aggregate-functions/reference/anyheavy.md
1
Selects a frequently occurring value using the [heavy hitters](https://doi.org/10.1145/762471.762473) algorithm. If there is a value that occurs more than in half the cases in each of the query's execution threads, this value is returned. Normally, the result is nondeterministic.
Link copied to clipboard!