TY - JOUR
T1 - Structure preserving anonymization of router configuration data
AU - Maltz, David
AU - Zhan, Jibin
AU - Hjálmtýsson, Gísli
AU - Greenberg, Albert
AU - Rexford, Jennifer L.
AU - Xie, Geoffrey
AU - Zhang, Hui
N1 - Funding Information:
Manuscript received 18 April 2008; revised 17 November 2008. This research was sponsored by the NSF under awards ANI-0085920, ANI-0331653, ANI-0114014, and CNS-0721574. Views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of Microsoft, CMU, AT&T, NSF, or the U.S. government. David Maltz and Albert Greenberg are with Microsoft Research. Jibin Zhan is with Conviva Networks. Gísli Hjálmty´sson is with Thule Investments. Jennifer Rexford is with the Dept. of Computer Science, Princeton University. Geffrey G. Xie is with the Dept. of Computer Science, Naval Postgraduate School. Hui Zhang is with the Dept. of Computer Science, Carnegie Mellon University. Digital Object Identifier 10.1109/JSAC.2009.090410.
PY - 2009/4
Y1 - 2009/4
N2 - A repository of router configuration files from production networks would provide the research community with a treasure trove of data about network topologies, routing designs, and security policies. However, configuration files have been largely unobtainable precisely because they provide detailed information that could be exploited by competitors and attackers. This paper describes a method for anonymizing router configuration files by removing all information that connects the data to the identity of the underlying network, while still preserving the structure of information that makes the data valuable to networking researchers. Anonymizing configuration files has unusual requirements, including preserving relationships between elements of data, anonymizing regular expressions, and robustly coping with more than 200 versions of the configuration language. Conventional tools and techniques are poorly suited to the problem. Our anonymization method has been validated with a major carrier, earning unprivileged researchers access to the configuration files of thousands of routers in hundreds of networks. Through example analysis, we demonstrate that the anonymized data retains the key properties of the network design. The paper sets out techniques that could be used in an attempt to break the anonymization, and it concludes our anonymization techniques are most applicable to enterprise networks, because the large number of enterprises and the difficulty of probing them from the outside make it hard to recognize an anonymized network based solely on publicly-available information about its topology or configuration. When applied to backbone networks, which are few in number and many of whose properties can be publicly measured, the anonymization might be broken by fingerprinting techniques described in this paper.
AB - A repository of router configuration files from production networks would provide the research community with a treasure trove of data about network topologies, routing designs, and security policies. However, configuration files have been largely unobtainable precisely because they provide detailed information that could be exploited by competitors and attackers. This paper describes a method for anonymizing router configuration files by removing all information that connects the data to the identity of the underlying network, while still preserving the structure of information that makes the data valuable to networking researchers. Anonymizing configuration files has unusual requirements, including preserving relationships between elements of data, anonymizing regular expressions, and robustly coping with more than 200 versions of the configuration language. Conventional tools and techniques are poorly suited to the problem. Our anonymization method has been validated with a major carrier, earning unprivileged researchers access to the configuration files of thousands of routers in hundreds of networks. Through example analysis, we demonstrate that the anonymized data retains the key properties of the network design. The paper sets out techniques that could be used in an attempt to break the anonymization, and it concludes our anonymization techniques are most applicable to enterprise networks, because the large number of enterprises and the difficulty of probing them from the outside make it hard to recognize an anonymized network based solely on publicly-available information about its topology or configuration. When applied to backbone networks, which are few in number and many of whose properties can be publicly measured, the anonymization might be broken by fingerprinting techniques described in this paper.
KW - Data anonymization
KW - Router configuration
UR - http://www.scopus.com/inward/record.url?scp=64249172288&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=64249172288&partnerID=8YFLogxK
U2 - 10.1109/JSAC.2009.090410
DO - 10.1109/JSAC.2009.090410
M3 - Article
AN - SCOPUS:64249172288
SN - 0733-8716
VL - 27
SP - 349
EP - 358
JO - IEEE Journal on Selected Areas in Communications
JF - IEEE Journal on Selected Areas in Communications
IS - 3
M1 - 4808478
ER -