TY - GEN
T1 - Releasing search queries and clicks privately
AU - Korolova, Aleksandra
AU - Kenthapadi, Krishnaram
AU - Mishra, Nina
AU - Ntoulas, Alexandros
PY - 2009
Y1 - 2009
N2 - The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries, clicks and their associated perturbed counts can be published in a manner that rigorously preserves privacy. Our algorithm is decidedly simple to state, but non-trivial to analyze. On the opposite side of privacy is the question of whether the data we can safely publish is of any use. Our findings offer a glimmer of hope: we demonstrate that a non-negligible fraction of queries and clicks can indeed be safely published via a collection of experiments on a real search log. In addition, we select an application, keyword generation, and show that the keyword suggestions generated from the perturbed data resemble those generated from the original data. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
AB - The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries, clicks and their associated perturbed counts can be published in a manner that rigorously preserves privacy. Our algorithm is decidedly simple to state, but non-trivial to analyze. On the opposite side of privacy is the question of whether the data we can safely publish is of any use. Our findings offer a glimmer of hope: we demonstrate that a non-negligible fraction of queries and clicks can indeed be safely published via a collection of experiments on a real search log. In addition, we select an application, keyword generation, and show that the keyword suggestions generated from the perturbed data resemble those generated from the original data. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
KW - Algorithms
KW - Experimentation
KW - Human factors
KW - Legal aspects
KW - Measurement
KW - Performance
KW - Security
KW - Theory
UR - http://www.scopus.com/inward/record.url?scp=84865663496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865663496&partnerID=8YFLogxK
U2 - 10.1145/1526709.1526733
DO - 10.1145/1526709.1526733
M3 - Conference contribution
AN - SCOPUS:84865663496
SN - 9781605584874
T3 - WWW'09 - Proceedings of the 18th International World Wide Web Conference
SP - 171
EP - 180
BT - WWW'09 - Proceedings of the 18th International World Wide Web Conference
T2 - 18th International World Wide Web Conference, WWW 2009
Y2 - 20 April 2009 through 24 April 2009
ER -