ARC4 Twister - rc4twister.pl

By: Chris Grier, Mike Perry

Our cryptosystem is essentially a stream cipher built upon the Mersenne Twister (MT) random number generator. The Mersenne Twister has a period of 2^19937-1 and its state is distributed evenly across 624 dimensions of phase space.
However, the Mersenne Twister is not a magic bullet. It is cryptographically insecure in that its internal state is multiplied by an invertible matrix and then directly returned in succession as the output of the PRNG. So if a user were to have 624*4 bytes of known plaintext, it would be possible for them to completely predict the output of the PRNG for the rest of the session. Though we did not rigorously analyze it, it also seems that key recovery should be possible from this internal state once it is obtained.
To combat this, the authors of the MT suggest running the output through a cryptographically secure hashing function (such as SHA1). We instead opted to do a RC4-like index hopping procedure:
  i
= i + 1; j = j + S[i]; swap(S[i], S[j]); z = S[j] + S[i]; return
S[z]; 
We have also added a scrambling step similar to the RC4 key scheduling phase to permute the initial state array (to make key recovery difficult, even in the event that this ordering of the returned state was determined).
Unfortunately, RC4 is no magic bullet either. It has been determined in [1] that the key scheduling step in combination with the known initialization of i and j to 0 open up RC4 to state determination given known plaintext at regular intervals. We attempted to address this weakness by initializing i and j arbitrarily with entropy from the state array and key.
Furthermore, RC4 has known weak states of the form i = a, j = a + 1, S[a] = 1 which cause the algorithm to sequentially reveal its state array just as the MT did before modification. Normal RC4 can't enter into these states due to the initialization of i and j to 0, however our attempt to harden RC4 by making i and j start at arbitrary indices leaves us vulnerable to this. We address this by checking for these states prior to doing the permutation step above, and simply incrementing i by 2.
To combat further against revealing information, we shuffle both the bytes and bits of the plaintext stream prior to encryption. This is done both to address common digraphs and to add entropy to the higher 4 bits of ASCII, which remain constant for 2/3 of the lowercase alphabet.
We also implement a simple 128-word dictionary compression algorithm, to conceal the 128 most space-consuming words present in the cleartext. This is the only step of the algorithm that ties us to English plaintext ASCII.
[1]. http://citeseer.nj.nec.com/fluhrer01weaknesses.html