Large-Scale Metric Computation in Online Controlled Experiment Platform

Tao Xiong, Yong Wang
2024
5 references

Abstract

Online controlled experiment (also called A/B test or experiment) is the most important tool for decision-making at a wide range of data-driven companies like Microsoft, Google, Meta, etc. Metric computation is the core procedure for reaching a conclusion during an experiment. With the growth of experiments and metrics in an experiment platform, computing metrics efficiently at scale becomes a non-trivial challenge. This work shows how metric computation in WeChat experiment platform can be done efficiently using bit-sliced index (BSI) arithmetic. This approach has been implemented in a real world system and the performance results are presented, showing that the BSI arithmetic approach is very suitable for large-scale metric computation scenarios.

1 repository
5 references

Code References

â–¶ ClickHouse/ClickHouse
5 files
â–¶ CHANGELOG.md
1
* NumericIndexedVector: new vector data-structure backed by bit-sliced, Roaring-bitmap compression, together with more than 20 functions for building, analysing and point-wise arithmetic. Can cut storage and speed up joins, filters and aggregations on sparse data. Implements [#70582](https://github.com/ClickHouse/ClickHouse/issues/70582) and [“Large-Scale Metric Computation in Online Controlled Experiment Platform” paper](https://arxiv.org/abs/2405.08411) by T. Xiong and Y. Wang from VLDB 2024. [#74193](https://github.com/ClickHouse/ClickHouse/pull/74193) ([FriendLey](https://github.com/FriendLey)).
â–¶ docs/changelogs/v25.7.1.3997-stable.md
1
* **NumericIndexedVector**: new vector data-structure backed by bit-sliced, Roaring-bitmap compression, together with more than 20 functions for building, analysing and point-wise arithmetic. Can cut storage and speed up joins, filters and aggregations on sparse data. Implements [#70582](https://github.com/ClickHouse/ClickHouse/issues/70582) and [“Large-Scale Metric Computation in Online Controlled Experiment Platform” paper](https://arxiv.org/abs/2405.08411) by T. Xiong and Y. Wang from VLDB 2024. [#74193](https://github.com/ClickHouse/ClickHouse/pull/74193) ([FriendLey](https://github.com/FriendLey)).
â–¶ docs/en/sql-reference/functions/numeric-indexed-vector-functions.md
1
NumericIndexedVector is an abstract data structure that encapsulates a vector and implements vector aggregating and pointwise operations. Bit-Sliced Index is its storage method. For theoretical basis and usage scenarios, refer to the paper [Large-Scale Metric Computation in Online Controlled Experiment Platform](https://arxiv.org/pdf/2405.08411).
â–¶ docs/_includes/content/changelog.md
1
* NumericIndexedVector: new vector data-structure backed by bit-sliced, Roaring-bitmap compression, together with more than 20 functions for building, analysing and point-wise arithmetic. Can cut storage and speed up joins, filters and aggregations on sparse data. Implements [#70582](https://github.com/ClickHouse/ClickHouse/issues/70582) and [“Large-Scale Metric Computation in Online Controlled Experiment Platform” paper](https://arxiv.org/abs/2405.08411) by T. Xiong and Y. Wang from VLDB 2024. [#74193](https://github.com/ClickHouse/ClickHouse/pull/74193) ([FriendLey](https://github.com/FriendLey)).
â–¶ src/AggregateFunctions/AggregateFunctionGroupNumericIndexedVectorDataBSI.h
1
* This is implementation of https://dl.acm.org/doi/10.14778/3685800.3685823.
Link copied to clipboard!