TY - GEN
T1 - Affirmative Sampling
T2 - 33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms, AofA 2022
AU - Lumbroso, Jérémie
AU - Martínez, Conrado
N1 - Funding Information:
Funding This work has been supported by funds from the MOTION Project (Project PID2020-112581GB-C21) of the Spanish Ministry of Science & Innovation MCIN/AEI/10.13039/501100011033, and by Princeton University, and its Department of Computer Science.
Publisher Copyright:
© Jérémie Lumbroso and Conrado Martínez; licensed under Creative Commons License CC-BY 4.0
PY - 2022/6/1
Y1 - 2022/6/1
N2 - Affirmative Sampling is a practical and efficient novel algorithm to obtain random samples of distinct elements from a data stream. Its most salient feature is that the size S of the sample will, on expectation, grow with the (unknown) number n of distinct elements in the data stream. As any distinct element has the same probability to be sampled, and the sample size is greater when the “diversity” (the number of distinct elements) is greater, the samples that Affirmative Sampling delivers are more representative than those produced by any scheme where the sample size is fixed a priori - hence its name. Our algorithm is straightforward to implement, and several implementations already exist.
AB - Affirmative Sampling is a practical and efficient novel algorithm to obtain random samples of distinct elements from a data stream. Its most salient feature is that the size S of the sample will, on expectation, grow with the (unknown) number n of distinct elements in the data stream. As any distinct element has the same probability to be sampled, and the sample size is greater when the “diversity” (the number of distinct elements) is greater, the samples that Affirmative Sampling delivers are more representative than those produced by any scheme where the sample size is fixed a priori - hence its name. Our algorithm is straightforward to implement, and several implementations already exist.
KW - Analysis of algorithms
KW - Cardinality estimation
KW - Data streams
KW - Distinct sampling
KW - Random sampling
UR - http://www.scopus.com/inward/record.url?scp=85134349370&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134349370&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.AofA.2022.12
DO - 10.4230/LIPIcs.AofA.2022.12
M3 - Conference contribution
AN - SCOPUS:85134349370
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms, AofA 2022
A2 - Ward, Mark Daniel
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Y2 - 20 June 2022 through 24 June 2022
ER -