Crypto-PAn (Cryptography-based Prefix-preserving Anonymization[1]) is a cryptographic algorithm for anonymizing IP addresses while preserving their subnet structure. That is, the algorithm encrypts any string of bits to a new string , while ensuring that for any pair of bit-strings which share a common prefix of length , their images also share a common prefix of length . A mapping with this property is called prefix-preserving.[2] In this way, Crypto-PAn is a kind of format-preserving encryption.
The mathematical outline of Crypto-PAn was developed by Jinliang Fan, Jun Xu, Mostafa H. Ammar (all of Georgia Tech) and Sue B. Moon.[3] It was inspired[3] by the IP address anonymization done by Greg Minshall's TCPdpriv program circa 1996.
Algorithm
editIntuitively, Crypto-PAn encrypts a bit-string of length by descending a binary tree of depth , one step for each bit in the string. Each of the binary tree's non-leaf nodes has been given a value of "0" or "1", according to some pseudo-random function seeded by the encryption key. At each step of the descent, the algorithm computes the th bit of the output by XORing the th bit of the input with the value of the current node.
The reference implementation[1] takes a 256-bit key. The first 128 bits of the key material are used to initialize an AES-128 cipher in ECB mode. The second 128 bits of the key material are encrypted with the cipher to produce a 128-bit padding block .
Given a 32-bit IPv4 address , the reference implementation performs the following operation for each bit of the input: Compose a 128-bit input block . Encrypt with the cipher to produce a 128-bit output block . Finally, XOR the th bit of that output block with the th bit of , and append the result — — onto the output bitstring. Once all 32 bits of the output bitstring have been computed, the result is returned as the anonymized output which corresponds to the original input .
The reference implementation does not implement deanonymization; that is, it does not provide a function such that . However, decryption can be implemented almost identically to encryption, just making sure to compose each input block using the plaintext bits of decrypted so far, rather than using the ciphertext bits: .
The reference implementation does not implement encryption of bitstrings of lengths other than 32; for example, it does not support the anonymization of 128-bit IPv6 addresses. In practice, the 32-bit Crypto-PAn algorithm can be used in "ECB mode" itself, so that a 128-bit string might be anonymized as . This approach preserves the prefix structure of the 128-bit string, but does leak information about the lower-order chunks; for example, an anonymized IPv6 address consisting of the same 32-bit ciphertext repeated four times is likely the special address ::
, which thus reveals the encryption of the 32-bit plaintext 0000:0000:0000:0000
.
In principle, the reference implementation's approach (building 128-bit input blocks ) can be extended up to 128 bits. Beyond 128 bits, a different approach would have to be used; but the fundamental algorithm (descending a binary tree whose nodes are marked with a pseudo-random function of the key material) remains valid.
Implementations
editCrypto-PAn's C++ reference implementation was written in 2002 by Jinliang Fan.[1]
In 2005, David Stott of Lucent made some improvements to the C++ reference implementation, including a deanonymization routine.[4] Stott also observed[4] that the algorithm preserves prefix structure while destroying suffix structure; running the Crypto-PAn algorithm on a bit-reversed string will preserve any existing suffix structure while destroying prefix structure. Thus, running the algorithm first on the input string, and then again on the bit-reversed output of the first pass, destroys both prefix and suffix structure. (However, once the suffix structure has been destroyed, destroying the remaining prefix structure can be accomplished far more efficiently by simply feeding the non-reversed output to AES-128 in ECB mode. There is no particular reason to reuse Crypto-PAn in the second pass.)
A Perl implementation was written in 2005 by John Kristoff.[5] Python[6] and Ruby[7] implementations also exist.
Versions of the Crypto-PAn algorithm are used for data anonymization in many applications, including NetSniff[8] and CAIDA's CoralReef library.[9]
References
edit- ^ a b c Jinliang Fan (April 2002). "Crypto-PAn.1.0.tar.gz (reference implementation)". Archived from the original on 2018-09-08. Retrieved 2018-09-07.
- ^ Jun Xu; Jinliang Fan; Mostafa H. Ammar; Sue B. Moon (2001). "On the Design and Performance of Prefix-Preserving IP Traffic Trace Anonymization" (PDF). Proceedings of the First ACM SIGCOMM Workshop on Internet Measurement. Retrieved 2018-09-07.
- ^ a b Jun Xu; Jinliang Fan; Mostafa H. Ammar; Sue B. Moon (2002). "Prefix-preserving IP address anonymization: Measurement-based security evaluation and a new cryptography-based scheme". Computer Networks. 46 (2): 253–272. doi:10.1016/j.comnet.2004.03.033. hdl:1853/6549. ISBN 0769518567. S2CID 6518311.
- ^ a b David Stott (August 2005). "Lucent's Extensions to Cryptopan". Archived from the original on 2018-09-08. Retrieved 2018-09-07.
- ^ John Kristoff (November 2005). "IP-Anonymous 0.04". Retrieved 2018-09-07.
- ^ Michael Bauer (2013-08-26). "pycryptopan 0.01d". Python Package Index. Retrieved 2018-09-07.
- ^ Peter Wood (2016-09-25). "CryptoPAn 0.1.0". rubygems.org.
- ^ "NetSniff / IP Address Anonymisation Modes / Crypto-PAn". 2007-06-26. Retrieved 2018-09-07.
- ^ "Documentation for the C Coral API / IP address anonymization". 29 March 2000. Retrieved 2018-09-07.