An efficient algorithm for sequential random sampling

Jeffrey Scott Vitter

1987

1 reference

Abstract

We examine several methods for drawing a sequential random sample of n records from a file containing N records. Method D is recommended for general use. The algorithm is on-line (so that CPU time can be overlapped with I/O), has a small constant memory requirement, and is easy to program. An improved implementation is detailed in the Appendix.

View Paper PDF DOI

🖥️ Operating Systems

1 repository

1 reference

Code References

▶ freebsd/freebsd-src

1 file

▶ contrib/jemalloc/doc_internal/PROFILING_INTERNALS.md

L12

Compared to our fast paths, even a `coinflip(p)` function can be quite expensive. Having to do a random-number generation and some floating point operations would be a sizeable relative cost. However (as pointed out in [[Vitter, 1987](https://dl.acm.org/doi/10.1145/23002.23003)]), if we can orchestrate our algorithm so that many of our `coinflip` calls share their parameter value, we can do better. We can sample from the geometric distribution, and initialize a counter with the result. When the counter hits 0, the `coinflip` function returns true (and reinitializes its internal counter).

Link copied to clipboard!