John A. Thomas CompuServe: 75236,3536 101 N.W. Eighth St. Grand Prairie, TX 75050 SURVEY OF DATA ENCRYPTION By John A. Thomas Introduction ------------ The following article is a survey of data encryption. It is intended to provoke discussion among the members of this forum and perhaps lead to a creative exchange of ideas. Although the basics of the subject seem to be known to few programmers, it embraces many interesting and challenging programming problems, ranging from the optimization of machine code for maximum throughput to the integration of encryption routines into editors, communications packages, and perhaps products as yet not invented. Governments have dominated this technology up until the last few years, but now the need for privacy and secrecy in the affairs of a computer-using public has made it essential that programmers understand and apply the fundamentals of data encryption. Some Cryptographic Basics ------------------------- A few definitions are appropriate first. We use the term "encryption" to refer to the general process of making plain information secret and making secret information plain. To "encipher" a file is to transform the information in the file so that it is no longer directly intelligible. The file is then said to be in "ciphertext". To "decipher" a file is to transform it so that it is directly intelligible; that is, to recover the "plaintext." The two general devices of encryption are "ciphers" and "codes". A cipher works on the individual letters of an alphabet, while a code operates on some higher semantic level, such as whole words or phrases. Cipher systems may work by transposition (shuffling the characters in a message into some new order), or by substitution (exchanging each character in the message for a different character according to some rule), or a combination of both. In modern usage, transposition is often called "permutation". A cipher which employs both transposition and substitution is called a "product" cipher. In general, product ciphers are stronger than those using transposition or substitution alone. Shannon referred to substitution as "confusion", because the output is a non-linear function of the input, thus creating confusion as to the set of input characters. He referred to transposition as "diffusion" because it spreads the dependence of the output from a small number of input positions to a larger number. Every encryption system has two essential parts: an algorithm for enciphering and deciphering, and a "key", which consists of information to be combined with the plaintext according to the dictates of the algorithm. In any modern encryption system, the algorithm is assumed to be known to an opponent, and the security of the system rests entirely in the secrecy of the key. Our goal is to translate the language of the plaintext to a new "language" which cannot convey meaning without the additional information in the key. Those familiar with the concept of "entropy" in physics may be surprised to learn that it is also useful in information theory and cryptography. Entropy is a measure of the amount of disorder in a physical system, or the relative absence of information in a communication system. A natural language such as English has a low entropy because of its redundancies and statistical regularities. Even if many of the characters in a sentence are missing or garbled, we can usually make a good guess as to its meaning. Conversely, we want the language of our ciphertext to have as high an entropy as possible; ideally, it should be utterly random. Our guiding principle is that we must increase the uncertainty of the cryptanalyst as much as possible. His uncertainty should be so great that he cannot make any meaningful statement about the plaintext after examining the ciphertext; also, he must be just as uncertain about the key, even if he has the plaintext itself and the corresponding ciphertext (In practice, it is impossible to keep all plaintext out of his hands). A prime consideration in the security of an encryption system is the length of the key. If a short key (i.e., short compared with the length of the plaintext) is used, then the statistical properties of the language will begin to "show through" in the ciphertext as the key is used over and over, and a cryptanalyst will be able to derive the key if he has enough ciphertext to work with. On the other hand, we want a relatively short key, so that it can be easily stored or even remembered by a human. The government or a large corporation may have the means to generate and store long binary keys, but we cannot assume that the personal computer user will be able to do so. The other important fact about the keys is that there must be very many of them. If our system allows only 10,000 different keys, for example, it is not secure, because our opponent could try every possible key in a reasonable amount of time. This introduces the concept of the "work factor" required to break an encryption system. We may not have a system unbreakable in principle, but if we can make the work factor for breaking so high it is not practical for our opponent to do so, then it is irrelevant that the system may be less strong than the ideal. What constitutes an adequate work factor depends essentially on the number of uncertainties the cryptanalyst must resolve before he can derive plaintext or a key. In these days of constantly improving computers, that number should probably exceed 2**128. It is easy to quantify the work factor if we are talking about exhaustive key trial, but few modern ciphers are likely to be broken by key trial, since it is too easy to make the key space very large. Most likely they will be broken because of internal periodicities and subtle dependency of output on input which give the cryptanalyst enough information to reduce his uncertainty by orders of magnitude. A corollary to work factor is the rule that a system need only be strong enough to protect the information for however long it has value. If a system can be broken in a week, but not sooner, then it may be good enough, if the information has no value to an opponent after a week. Cryptanalysis ------------- Cryptanalysis is the science of deriving plaintext without the key information. Anyone intending to design an encryption system must acquaint himself to some degree with cryptanalytic methods. The methods of attack may range from sophisticated statistical analysis of ciphertext to breaking into the opponent's office and stealing his keys ("practical cryptanalysis"). There are no rules of fair play. The cryptanalyist is free to use his puzzle-solving ingenuity to the utmost, even to the point of applying the knowledge that your dog's name is "Pascal", and that you might be lazy enough to use that as your key for the day. The cryptanalyst may have only ciphertext to work with, or he may have both ciphertext and the corresponding plaintext, or he may be able to obtain the encipherment of chosen plaintext. Some cryptographic systems are fairly strong if the analyst is limited to ciphertext, but fail completely if he has corresponding plaintext. Your system should be strong enough to resist attack even if your opponent has both plaintext and ciphertext. Computer power can greatly aid cryptanalysis, but many systems that appear strong can be broken with pencil-and- paper methods. For example, the Vigenere family of polyalphabetic ciphers was generally believed to be unbreakable up until the late nineteenth century. A polyalphabetic cipher is a substitution cipher in which a different alphabet is used for each character of plaintext. In these systems, the key determines the order of the substitution alphabets, and the cycle repeats with a period equal to the length of the key. This periodicity is a fatal weakness, since fairly often a repeated letter or word of plaintext will be enciphered with the same key letters, giving identical blocks of ciphertext. This exposes the length of the key. Once we have the length of the key, we use the known letter frequencies of the language to gradually build and test hypotheses about the key. Vigenere ciphers can be easily implemented on computers, but they are worthless today. A designer without knowledge of cryptanalysis however, might be just as ignorant of this fact as his colleagues of the last century. Please see the references at the end of this article for information on cryptanalytic technique. A Survey of Cryptographic systems --------------------------------- We now review some representative encryption schemes, starting with traditional ones and proceeding to the systems which are only feasible to implement on computers. The infinite-key cipher, also known as the "one time pad," is simple in concept. We first generate a key which is random and at least the same length as our message. Then, for each character of plaintext, we add the corresponding character of the key, to give the ciphertext. By "addition," we mean some reversible operation; the usual choice is the exclusive-or. A little reflection will show that given a random key at least the size of the plaintext (i.e., "infinite" with respect to the plaintext because it is never repeated), then the resulting cipher is unbreakable, even in principle. This scheme is in use today for the most secret government communications, but it presents a serious practical problem with its requirement for a long random key for each message and the need to somehow send the lengthy key to the recipient. Thus the ideal infinite key system is not practical for large volumes of message traffic. It is certainly not practical for file encryption on computers, since where would the key be stored? Be wary of schemes which use software random-number generators to supply the "infinite" key. Typical random-number algorithms use the preceeding random number to generate the succeeding number, and can thus be solved if only one number in the sequence is found. Some ciphers have been built to approximate the infinite-key system by expanding a short key. The Vernam system for telegraph transmission used long paper tapes containing random binary digits (Baudot code, actually) which were exclusively-or'ed with the message digits. To achieve a long key stream, Vernam and others used two or more key tapes of relatively prime lengths, giving a composite key equal to their product. The system is still not ideal, since eventually the key stream will repeat, allowing the analyst to derive the length and composition of the keys, given enough ciphertext. There are other ways to approach the infinite-key ideal, some of which are suggested in the author's article (with Joan Thersites) in the August '84 issue of DDJ. The "rotor" systems take their name from the electromechanical devices of World War II, the best known being perhaps the German ENIGMA. The rotors are wheels with characters inscribed on their edges, and with electrical contacts corresponding to the letters on both sides. A plaintext letter enters on one side of the rotor and is mapped to a different letter on the other side before passing to the next rotor, and so on. All of the rotors (and there may be few or many) are then stepped, so that the next substitution is different. The key is the arrangement and initial setting of the rotor disks. These devices are easy to implement in software and are fairly strong. They can be broken however; the British solution of the ENIGMA is an interesting story outside the scope of this note. If you implement a rotor system, consider having it operate on bits or nybbles instead of bytes, consider adding permutation stages, and consider how you are going to generate the rotor tables, since you must assume these will become known to an opponent. In 1977 the National Bureau of Standards promulgated the Data Encryption Standard (DES) as the encryption system to be used by all federal agencies (except for those enciphering data classified under any of the national security acts). The standard is available in a government publication and also in a number of books. The DES was intended to be implemented only in hardware, probably because its designers did not want users to make changes to its internal tables. However, DES has been implemented in software and is available in several microcomputer products (such as Borland's Superkey or IBM's Data Encoder). The DES is a product cipher using 16 stages of permutation and substitution on blocks of 64 bits each. The permutation tables are fixed, and the substitutions are determined by bits from a 56-bit key and the message block. This short key has caused some experts to question the security of DES. Controversy also exists regarding the involvement of the NSA in parts of the DES design. The issues are interesting, but beyond the scope of this note. Since DES was intended for hardware implementation, it is relatively slow in software. Software implementations of DES are challenging because of the bit-manipulation required in the key scheduling and permutation routines of the algorithm. Some implementations gain speed at the expense of code size by using large pre-computed tables. The public key cipher is an interesting new development which shows potential for making other encryption systems obsolete. It takes its name from the fact that the key information is divided into two parts, one of which can be made public. A person with the public key can encipher messages, but only one with the private key can decipher them. All of the public key systems rely on the existence of certain functions for which the inverse is very difficult to compute without the information in the private key. These schemes do not appear to be practical for microcomputers if their strength is fully exploited, at least for eight-bit machines. One variety of public key system (the "knap-sack") has been broken by solution of its enciphering function, but this is no reflection on other systems, such as the RSA scheme, which use different enciphering functions. All public-key systems proposed to date require heavy computation, such as the exponentiation and division of very large numbers (200 decimal digits for the RSA scheme). On the other hand, a public-key system that worked at only 10 bytes/sec might be useful if all we are sending are the keys for some other system, such as the DES. Some random thoughts -------------------- To wrap up this too-lengthy exposition, I append a few questions for the readers: Must we operate on blocks instead of bytes? Block ciphers seem stronger, since they allow for permutation. On the other hand, they make life difficult when file size is not an integral multiple of the block size. Can we make a file encryption system OS-independent? This is related to the question above on blocks vs bits. How do we define the end-of-file if the plaintext is ascii and the ciphertext can be any 8-bit value? Can we find an efficient way to generate and store a random key for the infinite-key system? Hardware random-number generators are not hard to build, but would they be of any use? Bit-fiddling is expensive. Can it be avoided and still leave a secure system? What are the relative costs of manipulating bits on the Z80 vs the 68000, for example? No file-encryption system can erase a file logically and be considered secure. The information can be recovered until it is overwritten. Overwriting files adds to processing time. I am informed that it is possible to reliably extract information even from sectors that HAVE been overwritten. Is this so? If it is, what is the solution? How do we integrate encryption systems into different tools? Should a telecommunications program transparently encrypt data if the correspondent is compatible? What about an editor-encryption system wherein plaintext would never exist on the disk, only on the screen? How would we manage to encipher/decipher text as we scroll through it and make changes, and still get acceptable performance? By their nature, encryption schemes are difficult to test. In practice, we can only have confidence that a system is strong after it has been subjected to repeated attack and remained unbroken. What test might we subject a system to that would increase our confidence in it? References ---------- Here are a few useful books and articles. This is by no means a complete bibliography of the subject: Kahn, David. "The Code Breakers". The basic reference for the history of cryptography and cryptanalysis. Use it to learn where others have gone wrong. Konheim, Alan G. "Cryptography, A Primer". Survey of cryptographic systems from a mathematical perspective. Discusses rotor systems and the DES in great detail. Sinkov, Abraham. "Elementary Cryptanalysis". Very basic, but very useful, introduction to the mathematical concepts of cryptanalysis. Foster, Caxton C. "Cryptanalysis for Microcomputers". Covers the cryptanalysis of simple systems, but still a good introduction to cryptanalytic technique. Describes the operation of many traditional systems in detail. Shannon, Claude. "Communication Theory of Secrecy Systems". Bell System Technical Journal (October 1949) : 656-715. Discusses secrecy systems from viewpoint of information theory. No practical tips, but useful orientation. Rivest, R. et al. "A method for Obtaining Digital Signatures and Public Key Cryptosystems". Comm. of the ACM, Vol. 21, No. 2, (February 1978) : 120-126. This article describes what has come to be known as the RSA public-key system. "Data Encryption Standard", Federal Information Processing Standard (FIPS), Publication No. 46, National Bureau of Standards, U.S. Dept. of Commerce, January, 1977. --- To start off, I'll discuss some *good* points of the Data Encryption Standard promulgated by the federal government (DES). We all know the key is too small - the NSA almost certainly has the means to exhaust it. But it would be a pity not to examine the DES just for that reason. It uses a brilliant cryptographic technique that once understood can be used by us in families of other crypto-systems, with much larger keys. The DES shows us how to use one-way functions in cryptography. At first sight, a one-way function would seem to be useless - if we encrypt 'A' and get 'R', we have to be able to decrypt 'R' and get back 'A'. However, a one-way function, if it could be used, allows very complex transformations of text that are practically impossible to undo without knowledge of the key. However, the DES is as complicated as it is complex, so for a beginning I'm going to explain how to use a one-way function cryptographically without reference to the DES. If there's enough interest, we can later relate this to the DES. Perhaps the simplest way to define a one-way function is with a table, such as: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ----------------------------------------------- 14 05 01 14 10 04 08 00 03 02 15 08 09 11 07 15 The top numbers are indexes into the one-way table. Given an index, you can get a value by table look up, but given a table value there's no guarantee you'll get the index you started with because both 0 and 3 map to 14, 6 and 11 map to 8, and so on. BTW, the table values were generated by a random process. Now, let's use this cryptographically. Take an ASCII letter, say 'a' and split it into two nibbles, left and right LEFT RIGHT 61h -> 6 1 The DES trick is to use one half as an argument of a one-way function to obtain a value with which to encrypt the other half, so let's do it. Using RIGHT as the index into the table, we obtain the value 5. Now, we need a primitive encryption function that is, and must be, invertible. The DES uses XOR, but we will use ADD, and add 5 to LEFT, mod 16, obtaining 11. We have LEFT RIGHT 11 1 This encrypts LEFT. Notice that even though we used a one-way function, the encryption is completely reversible for two reasons. First, RIGHT, the argument of the table lookup is unchanged, so getting the correct value from the table for decryption is assured. Second, the encryption primitive is invertible; we need only to subtract the table value mod 16 from encrypted left. But there seems to be a problem. RIGHT isn't encrypted, and if that's how we left matters, we wouldn't have much of a cipher. The answer suggests itself - use the new value of LEFT in the same way to encrypt RIGHT. Let's do it. Using 11 as an index into the table gives us 8, which we add to RIGHT giving LEFT RIGHT 11 9 which on the IBM PC is a funny graphics character. Now, is this reversible? Of course it is, the argument into the table that encrypted RIGHT is completely unchanged, and if used we get 8 which we subtract from 9 giving 1. And we have already shown that 11 1 will undo properly to give us 6 1. So far, we have nothing but a curious simple substitution cipher. Repeating the process isn't encouraging either. It clearly must cycle if we continue to alternately encrypt LEFT and RIGHT, using the values from the previous encryption as input. In fact, there are a number of cycles of different lengths, but it's interesting that some cycles don't cycle back to the starting value. For example, starting with 01, we get 01, 55, 99, BB, 33, EE, 55... The reason is that our table is not a permutation of the integers 0 through 15. To make our sample cipher more than a curiosity, we shall do what the DES does, use another one-way function (that is, a table) in a second *round*, and repeat this process with the new table. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ----------------------------------------------- 04 13 12 06 13 07 03 15 13 15 06 09 09 09 07 10 07 LEFT RIGHT 11 9 -> 15 + 11 = 6 11 = 9 + 3 <- 6 9 6 11 This is already a considerably more complex cipher. It is still reversible, but we must remember to *decrypt* it starting with the last table, not the first. This cipher, like the DES, is sensitive to direction, encrypt or decrypt. It still is a simple substitution cipher with no strength. You will notice however, that the length of the second table is one more than that of the first. Obviously, I intend to rotate both tables one position before encrypting the next letter of a message. Since the lengths are relatively prime, I can encipher 16x17=272 letters before my tables repeat. This is no longer a simple substitution cipher, but one far more complex. Still not good enough by far. But if I had only one message not too long to protect, this would be practically unbreakable. It might be good for hundreds of letters before repetitions would enable cryptanalysis. Of course, it isn't strong enough to protect against an attack based on known plaintext. With enough known plaintext, both tables can be reconstructed. If the DES used only two rounds, it too could be cryptanalyzed. It can be broken because with only two rounds, there aren't enough addends for known input and output that the addends can't be exhausted. Several more rounds are necessary to make exhaustion of possible addends impractical. It is quite important in designing a crypto-system that you design it against *known* plaintext. In practice, it is impossible to prevent known plaintext from falling into the cryptanalyst's hands. Later, I will develop this sample into a very tough cipher indeed, though I won't make any claims for its ultimate strength (I haven't examined it that well yet). But for now, let's sum up what we have done. 1. We used a one-way function for a crypto-system. To give credit where it is due, this technique was invented by Horst Feistel of IBM, originally for the Lucifer. It is in my opinion an absolutely brilliant cryptographic innovation. The technique is fundamental to the DES. 2. We cascaded one-way functions for complexity. The DES uses cascading similarly for strength. 3. Unlike the DES, we have developed the germ of a byte cipher, rather than a block cipher. Very great complexity may be had with a block cipher, but there is still doubt that block ciphers are 'better'. It is the trick of alternately enciphering 'left' and 'right' that permits the use of a one-way function. In practice, it is easier to swap or exchange right and left after each round. That is, we use RIGHT to encrypt LEFT, then exchange RIGHT and LEFT, and repeat encryption for the next round. This simplifies code and circuitry, though it may confuse us as we try to follow what happens in the DES. Therefore, to avoid confusion, we will always keep RIGHT right, and LEFT left. --- To begin, we shall consider the DES encipherment a little abstractly, then later in more detail. The heart of the DES one-way function consists of the eight S-boxes, each of which transforms six bits of the input to four bits of output. We can understand the function of the S-boxes better if we first consider the transformation they effect as they act in concert upon the entire input block of 48 bits. Imagine one table consisting of 2^48 numbers of 32 bits each. Each 32 bit number is repeated 65,536 times, in a more or less random order. Also imagine a 'letter', not of one byte, but of 64 bits divided into a LEFT of 32 bits and a RIGHT of 32 bits. RIGHT is combined with 48 key bits selected out of 56 in a way that will be described later, without, however, changing RIGHT, and this combined value is used as an argument into the huge table to return a pseudo-random 32 bit value. This returned value is XORed with LEFT to encrypt LEFT. The same table lookup is repeated in the next round using LEFT as the argument and encrypting RIGHT. Except that a different arrangement of 48 key bits out of the 56 is used. The DES repeats this process 16 times, that is, encrypts RIGHT eight times, and LEFT eight times. There is a reason for eight that we will discuss later. Each iteration is called a 'round'. It is clear that this huge table is a one-way function, and that the encryption technique is almost exactly what we described for the byte cipher discussed in the previous upload. There is an important difference - we are now using a key in addition to a table. In our byte cipher, the key is the table itself. Also, the DES uses a different permutation of the original key for every round in a clever way that ensures that every bit of final ciphertext depends complexly on every bit of the key. This is important. A sure-fire cryptanalytic technique is to suppose a few key bits - not too many possibilities to exhaust - then to test the supposition against sample ciphertext compared with known plaintext. In this way, although a key space may be too large to exhaust by brute force, the key is gradually recovered. A good example is the simple substitution where the key space is 26! (about 2^88). But, by forcing all ciphertext bits to depend on all key bits, this sure-fire attack is inhibited if not made impossible. You can't solve for a few key bits, you have to solve for all or none. Why a key? The chief reason is that the one-way table is published. It is no secret as we suppose our byte cipher's tables are. And, to be a standard, we aren't supposed to change the DES tables. Further, the tables are not random; the inventors state that the DES's particular tables have been worked out for cryptographic strength, and that to their own surprise discovered that random tables, which they had naively believed to be sufficient, aren't so good. And, nobody will say what criterion should be used to design DES tables, and nobody has figured them out (at least, not publicly). Naturally, this has bred suspicion, and the fact that the NSA helped design the tables hasn't helped. Later, I would like to offer my own speculation on what this criterion might be. To summarize, the key serves as the secret ingredient because the tables can't be secret. Obviously, a table of this size is a practical impossibility. So, how does the DES achieve a 'virtual' table of this size? Basically, nibble by nibble. It uses the nibbles of RIGHT, in order, as indexes into relatively small matrixes to get eight encryption values per round. But the encryption values don't encrypt the corresponding nibbles of LEFT. Oh no, that would be fatally simple. Each result nibble of the one-way lookup encrypts scattered bits of LEFT so that each bit of the value impacts the most nibbles possible in LEFT. Now, when LEFT becomes the argument, the nibble values returned from the table have dependencies on many bits of RIGHT. Within five rounds, all ciphertext bits become complex dependents of all plaintext bits and all key bits. The dependency is extremely violent. Believe me, a single bit of difference in either plaintext or key gives you an unrecognizably different ciphertext. We should be ready now to see how a nibble of RIGHT (actually, three will be involved), and some key bits, are used as table arguments, and how this encrypts four bits of LEFT in a single round. Let's ignore how the different permutations of the key bits are generated for now. Just imagine there are somehow six key bits. The first nibble one-way function is this matrix: Box S1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |----------------------------------------------- 0 |14 4 13 1 2 15 11 8 3 10 6 12 5 9 0 7 1 | 0 15 7 4 14 2 13 1 10 6 12 11 9 5 3 8 2 | 4 1 14 8 13 6 2 11 15 12 9 7 3 10 5 0 3 |15 12 8 2 4 9 1 7 5 11 3 14 10 0 6 13 Recall that the input block of 32 bits is first expanded to 48 bits by the "selection operation". If you consult this table in the standard, you will see that the first six bits of the result are the first RIGHT nibble, plus the last bit of RIGHT, plus the first bit of the next nibble to form six bits: 32 1 2 3 4 5 This is one argument into the one-way function. Also, six bits of the key are an argument, and the six key bits and these RIGHT bits are XORed to form the actual table argument. The first and last bits of this XOR-result index the row of the S1 box. The middle four bits index the column. The intersection of row and column gives us a four-bit value, which, after passing through the permutation operation P, is used to encrypt LEFT bits 9 17 23 and 31. This is surely one-way. You can't determine the row and column indexes from the table value because there are four row and column indexes that map to the same value. It should also be clear that LEFT bits 9 17 23 31 are decryptable because RIGHT was never changed. We have only to combine the appropriate key bits with RIGHT's 32 1 2 3 4 5, and we'll get the same value out of the S box, which, XORed with LEFT, will yield our original 9 17 23 and 31. Notice the scatter. By encrypting these particular bits, we are encrypting the 3rd, 5th, 6th, and 8th nibbles which will be immediately used in the next round as arguments. Because of their positions, they will form part of the 2nd, 3rd, 4th, 5th, 6th, and 8th nibble arguments of the future LEFT. And thus, when RIGHT gets encrypted, its old first nibble will be quite spread out. The scatter makes far more bits dependent on very few bits than is at first apparent. The only unaffected nibbles are the 1st and 7th, but this is only one round, and we tracked only one nibble of RIGHT. This is getting lengthy, so let me list the RIGHT bit arguments corresponding to the eight S boxes and the LEFT bits that get encrypted RIGHT bits box LEFT bits 32 1 2 3 4 5 ----> S1 ----> 9 17 23 31 4 5 6 7 8 9 ----> S2 ----> 13 28 2 18 8 9 10 11 12 13 ----> S3 ----> 24 16 30 6 12 13 14 15 16 17 ----> S4 ----> 26 20 10 1 16 17 18 19 20 21 ----> S5 ----> 8 14 25 3 20 21 22 23 24 25 ----> S6 ----> 4 29 11 19 24 25 26 27 28 29 ----> S7 ----> 32 12 22 7 28 29 30 31 32 1 ----> S8 ----> 5 27 15 21 In this manner, a nibble by nibble encryption is spread out so that a virtual table of 2^48 elements is achieved. Note that we have not really considered the key yet. And note that I have shown the contents of only one box. The boxes are listed in FIPS Pub 46, which you should use if you ever implement the DES, because other sources usually have typos, the slightest of which is fatal. Also, pay attention to how RIGHT's bits are expanded to 48. The last bit of the previous nibble plus the first bit of the next are tacked onto each nibble of the argument fore and aft. This builds an inter-nibble dependency into the DES. Even more important, one encrypting bit from the table can do a lot of 'damage'. Look at the nibble value coming out of the second S-box; its first bit will encrypt the 13th bit of LEFT. But look where the 13th bit occurs in expanded RIGHT! The result of this encryption is that when LEFT becomes the table argument, the third and fourth table nibbles are dependent on just one bit. As a matter of fact, the value out of any S-box encrypts exactly two nibble boundary bits and two nibble inner bits. We saw how the DES uses one-way functions, each specific to a nibble, and how the results of the one-way functions are carefully scattered. The purpose of the scattering is to build up complex dependencies for the final result on all bits of message and key as fast as possible. The scatter is achieved by a permutation, called P, that the standard describes as occurring after gathering the eight table values. For software implementation, there is a great savings in speed by replacing all S-box values with 32-bit entries pre-permuted by the inverse of P - that is, by the encrypting bit positions we already listed, and doing away with P entirely. To illustrate, the first value of the first S-box is 14, in binary, 1 1 1 0. Therefore, to do away with P, we replace this value with its equivalent 1 2 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 --------------------------------------------------------------- 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 The gain in speed can't be overemphasized. There are more implementation tricks like this that hopefully we can get into later. On an 8088, at 4.77 mHz, encryption speeds of 2500 bytes/sec are possible in pure software. On a 68000, we think that software speeds of 6000, or better, bytes/sec are achievable. Perhaps even as high as 8000. The important lesson is that you don't have to code in the dumb plodding way the standard implies. It seems that the standard was written to be as uninstructive as possible. We should now discuss the key. We won't go into a bit by bit description, for that, see FIPS 46. The main idea is to generate 16 48-bit keys out of the original 56 that 1. are simply generated 2. vary one from the other as much as possible 3. fit the 'scatter' to involve all key bits in all cipher bits For this, two (actually, three) permutations are used, called PC1 and PC2 in the standard, awkward names, but we'll use them. This 'key scheduling' is not perfect; it permits weak keys (keys that are their own inverse) and semi-weak keys (keys that result in alternating patterns so that all encryptions of LEFT are as if encrypted by an invariant key, and a different invariant key for RIGHT). The key scheduling just isn't good enough to avoid these weaknesses. PC1 is a geometric permutation that doesn't seem to have deep reason. It discards the so-called parity bits, thus reducing 64 bits to 56, and divides the key bits into two halves. The halving *is* important. This is a picture of the PC1 transformation of the original key bits with their 'parity' bits 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 ^ 41 42 43 44 45 46 47 ^ 48 | 49 50 51 52 53 54 55 | 56 | 57 58 59 60 61 62 63 | 64 C half D half discarded C = 57 49 .. 44 36 D = 63 55 .. 4 12 20 28 Very geometric, which, combined with the geometric shifts used with PC2, cause a larger number of weak keys than would be strictly necessary. Now, PC2, which is actually two permutations, one that operates on circular shifts of C, and one that operates on circular shifts of D, has an odd property; it re-arranges 24 bits out of C and 24 bits out of D so that the C bits directly combine only with the first 16 bits of RIGHT (or LEFT), and the D bits only with the last 16 bits of RIGHT (or LEFT). (I'm not considering the nibble boundary bits). Bit 41, in other words, will never combine with bits 17 through 32 of the one-way argument. Similarly, bit 47 will never combine with bits 1 through 16 of the one-way argument. Indirect combination, due to the scatter, is another story. My guess is, that cutting the key bits into halves is deliberate to ensure that all key bits encrypt all cipher bits as quickly and simply as possible. If you carefully examine which bits get encrypted by the table values, you will see that the scatter also repositions substitution bits by halves. That is, in our illustration, the first 16 bits plus the nibble boundary bits, of RIGHT, encrypt half of the first 16 bits of LEFT and half of the last 16 bits of LEFT (almost). Also, the last 16 bits of RIGHT cause encryption of the remaining first and second halves of LEFT. I think this completely explains the purpose of the P permutation described (described? just listed!) in the standard. For PC2, see FIPS 46. Let me just mention that since PC2 selects 24 bits out of each half, in any of the key schedules there are always 4 bits out of the 56 missing. This makes it harder for the cryptanalyst to work the cipher backwards by supposing and testing assumptions about the key. Finally, let me add an implementation note. You don't have to go through the key scheduling algorithm for every block like the standard describes. Instead, you program an initialization call that generates the 16 key schedules one time. You can do this because the schedules are invariant from block to block. Such an initialization call makes the difference between encryption rates of 80 bytes/sec and 700 bytes/sec for an otherwise uninspired implementation, or 2500 bytes/sec for what is in my opinion the very finest ever for the IBM PC, the Data Encoder. To make all cipher bits depend on all key bits is no mean accomplishment. The attempt is fraught with booby traps. Most ways you can think of to force this dependency have the unfortunate result of transforming the actual key into a simpler and much smaller equivalent key. This danger the designers of the DES managed to avoid. I mention this because we have seen some weaknesses in the DES's key scheduling, but these weaknesses should not blind us to the DES's good points if we want to learn how to design ciphers. The DES has another weakness, the complementary property, that effectively halves the key space. This property is caused by the two uses of XOR in the algorithm. Let's use '~' to indicate boolean 'not', 'm' to indicate 'message', and 'k' to indicate 'key'. The complementary property is as follows: DES(k,m) = ~DES(~k,~m) Read this as: the DES encryption of a message, m, under a key, k, is IDENTICAL to the complement of the DES encryption of ~m under key ~k. Remember that the key bits are combined with the message bits by a XOR to look up an encrypting value in an S-box. Because of this XOR, m and k, or ~m and ~k, map to the identical value in an S-box because of the boolean identity (m XOR k) = (~m XOR ~k) Remember also that LEFT is encrypted by XORing it with the looked-up value. Due to another boolean identity (LEFT XOR VALUE) = ~(~LEFT XOR VALUE) we have the complementary property. The result is that for known plaintext, the DES's key space is 2^55, not 2^56. And cryptographers must assume that plaintext is known when analyzing a cipher; that simply isn't unreasonable. This weakness would have been easy to avoid. We have only to *add* the key to RIGHT (or LEFT) mod 2^48, and we would not map to the same S-box values for complements of message and key; and to *add* the looked-up value to LEFT (or RIGHT) mod 2^32. And as a general rule, XOR is not a good primitive for ciphers. True, it is convenient because XOR is its own inverse, while if we used ADD to encrypt, we would have to use SUB to decrypt. But in cryptography there are more important things than convenience, for example, keeping the actual key space close to the potential. Why this weakness is not corrected is hard to understand. DES defenders claim that the complementary property is useful for verifying the correctness of implementations. Basically, you code the DES then test it to see if the complementary property holds for randomly chosen key and plaintext, and if it does, you are supposed to get a 'warm' feeling. But this argument can't be valid. Errors in key scheduling and in S-box values can't be caught by this test. Matter of fact, most errors in coding the DES can't be caught by the complementary property, so long as XOR is used to combine RIGHT and key, and to encrypt LEFT. It only proves that XOR is used both places, not that things are right! DES defenders also claim that XOR is necessary to keep decryption like encryption, that is, to avoid sensitivity to direction. However, the DES, whether it uses XOR or not, *is* sensitive to direction. To encrypt, you must start with the *first* key schedule, and work your way to the 16th. To decrypt, you must start with the 16th key schedule, and work your way backwards to the first. Since the DES is sensitive to direction anyhow, it wouldn't hurt a thing to use ADD one way, and SUB the other. My opinion is that XOR is a mistake realized too late, and that correction is resisted because too much is now invested in this mistake, and that defense of the XOR are rationalizations, not reasons. And yes, it does matter. The key space of 2^56 isn't enough anyhow. If the NSA can exhaust all 2^56 keys in a day (and on average, it need only to exhaust half that, or 2^55), then for known plaintext, which is very common, it can exhaust all possible 2^55 keys in half a day (or on average 2^54 keys in one quarter of a day). But our interest in the DES is not its defense, but to learn good ciphering from both its good and bad points. The lesson is clear; avoid XOR for ciphers, because it halves the key space. If you implement the DES and don't need to maintain the standard, I recommend using ADD and SUB instead of XOR. Software implementation of the DES is frowned on, not because it is slow, but because it permits a knowledgeable person to tamper with the DES to the NSA's distress. The NSA prefers hardware implementations only (ROM qualifies as 'hardware') and will more readily grant export licenses for hardware than software. Here is one suggestion that practically increases the key space to 120 bits. Begin with a seed of 64 bits, the seed being an extension of your 56-bit key. Encrypt the seed by your key and use the resulting 16 nibbles of the ciphertext to shuffle the nibbles of the first row of the first S-box. Encrypt the ciphertext (not the seed) again with the altered S-box. Use the second ciphertext block to shuffle the nibbles of the next row of the S-box. Repeatedly encrypt blocks and shuffle rows until all 32 rows have been altered. Thereafter, use your 56-bit key and the altered DES tables to encrypt and decrypt your messages. In all probability, the resulting ciphertext won't be as nice cryptographically with these randomized tables as with the contrived ones, but this scheme will thwart a brute-force attack based on exhaustive key trial. A key search on a special-purpose machine is possible over a 55-bit key space (given known plaintext), but it is not possible over a 120-bit key space. We may take up later other pros and cons of modifying the DES tables. The inventors of the DES state that five rounds are required for all ciphertext bits to become dependent on all key bits and all message bits. Yet the DES uses 16 rounds. How come? Wouldn't it be better to limit the rounds to five for a great increase in throughput? Or would it? In examining a cipher it is very good not to take anything for granted, to ask oneself 'why this, instead of that?' and attempt to find a coherent reason for it. I haven't been able to answer this question satisfactorily for myself. However, the DES is too carefully crafted for me to lightly believe that it uses 16 rounds just because that is a nice round number in binary arithmetic. Perhaps my current thinking will help others to puzzle it out. The DES is breakable, not just by brute force but by cryptanalysis, if it used two rounds instead of 16, and if plaintext is known. To see this, let's list knowns and unknowns for two rounds. We need a notation now for LEFT and RIGHT to show rounds, and we will use L[i] and R[i]. When i=0, LEFT and RIGHT are plaintext; when i=n, LEFT and RIGHT are ciphertext. Also, we will designate key schedules by K[j]. This is the picture when n=1. L[0] R[0] known by assuming known plaintext L[1] R[1] known by intercepting ciphertext K[1] unknown K[2] unknown Remember that encryption means a 32-bit number was picked out of the S-boxes as a function of R[0] and K[1], and this number encrypted L[0] to produce L[1]. Similarly, another 32-bit number was picked out of the S-boxes as a function of L[1] and K[2], and the second number encrypted R[0] to produce R[1]. There is another way to look at this (the poet Wallace Stevens listed thirteen ways of looking at a blackbird). One nibble is picked out of the first S-box as a function of the bits 32 1 2 3 4 5 of R[0] and the first six bits of K[1] (bits 10 51 34 60 49 17 of the key), and encrypts bits 9 17 23 and 31 of L[0] by XOR. And so on. Now, if we have known plaintext, it is simply no problem to determine what the S-box values were that encrypted both LEFT and RIGHT. We just take bits 9 17 23 and 31 of L[0] and XOR them against the same bits of L[1]. And so on. Let's call these nibbles the encrypting *sums*, and because there is only one round per side, the sums are also direct S-box values. For each S-box value, there are four possible 6-bit keys that will map to the S-box value. For all the nibbles that encrypted left, the possible key space used with R[0] is reduced from 2^48 to (2^2)^8, or 2^16. This key space is easily exhausted, and we can now attack K[2]. By the same procedure, the key space for K[2] is also reduced to 2^16, however, K[2] must be consistent with K[1]. That is, the bits of K[2] are practically the same as for K[1], but rearranged. So, in fact, the key space for K[2] is far smaller than 2^16, I think it is only 2^4 (but I haven't counted it yet). This breaks a two-round DES. Now suppose four rounds; two for LEFT and two for RIGHT, and list the knowns and unknowns again L[0] R[0] known (plaintext) L[1] R[1] K[1] K[2] unknown! (intermediate ciphertext) L[2] R[2] known (ciphertext) K[3] K[4] unknown Now when we derive the encrypting sums we no longer have the S-box values, but two S-box *addends*, and for any sum, there must be 16 pairs of addends. We simply don't know what L[1] and R[1], the intermediate ciphertext, are anymore. The possible keys for K[1] increase from 2^16 to 2^32. This begins to look computationally difficult. Instead of four possible 6-bit keys per nibble, we now have 16. However, let us remember the consistency requirement of the key bits. If we do suppose a particular key bit is 1, it must be 1 in all the schedules in which it occurs, regardless of position. So, the combinatorics aren't quite as daunting as they first appear. It seems that this is still solvable. To work this out accurately, you would have to construct tables of key bits according to the key schedules, then note the bits repeated (they will be in different positions) from round to round. For six rounds, the number of possible keys used with R[0] jumps to 2^48. With a large machine, it should be possible to solve. But any more rounds, we would be solving for 2^56 possible keys; that is, we are back to a brute force attack. With 16 rounds, there are seven intermediate unknown LEFTs and RIGHTs and the combinatorics become too great to backsolve for the key. I suspect that if the math is worked out, 16 rounds make it a practical certainty that backsolving is infeasible, whether we attack all bits used in one round, or a few bits used in all rounds. It would be nice to know the exact number of rounds that reach this infeasibility; the math, however, escapes me. We must remember this when we return to our byte cipher (discussed in the file "DES"), because we should determine how many rounds are required to make backsolving with known plaintext infeasible. The number of rounds is important; too many is inefficient, too few is fatal. --- Experts have studied the S-boxes. To a casual eye each row of the S-boxes appears to be a random permutation of the nibbles 0 through 15. But they aren't random. So far, nobody has figured out their ordering or why. I haven't figured out the S-boxes, and it is unlikely I ever will. Nevertheless, I am inclined to believe the inventors when they say there is no trap door, and the computed boxes really are 'better' cryptographically than random boxes. Keep in mind that many mathematicians, computer scientists, and gifted amateur cryptanalysts who owe nothing to the NSA have pored over these boxes. Also, the inventors' tone in their defense of the DES ('Cryptography: A New Dimension in Computer Security'; Carl Meyer and Stephan Matyas, John Wiley and Sons) strikes me as sincere. They seem to me genuinely surprised to discover that non-random boxes are better than random. I can't explain the S-boxes, but I do have an idea why non-random boxes might be 'better'. It turns out that the ENIGMA rotors also were not randomly wired. Rather, they were wired to ensure 'distance'. The idea is that similar plaintext should not encrypt under a similar key to a similar ciphertext. I haven't studied the ENIGMA in detail yet, so I don't know what threat 'closeness' poses to the key that the German inventors were trying to avoid. Yet, it must have been a threat, because the German cryptographers deliberately avoided random rotor wiring, in spite of the weaknesses 'distance' wiring introduced. There is the same idea in symbol table hashing algorithms. Here, one wants similar variables, perhaps differing only by one character to hash to far different places in a symbol table. The reason is, we want to avoid 'collisions' - different variable names accidentally hashing to the same symbol table location, causing extra effort to find an unused slot for each variable name. Most hashing schemes have the effect of discarding the high order character, so without 'distance', in FORTRAN, for example, collision is practically assured because FORTRAN uses the initial letter of a variable to classify the data type, and we tend to name variables systematically. Might this not be the secret of the S-box design criterion? It seems to me that the message space mapping of a cipher is not in principle different from symbol table hashing. And if 'distance' is *not* a criterion of the mapping, maybe it ought to be. What I'm saying, strictly as speculation, is that very similar plaintext, differing perhaps by only a bit, should map to wildly different points in the message space, that it may be impossible to guarantee distance with random table values, and that the S-box values were carefully computed to ensure distance. Exactly why the lack of distance should be weak, I haven't figured out. In support of my guess, the DES does indeed exhibit the property that similar input maps to violently different results. But even if my guess is right, this leaves unanswered the important question of how to design the boxes. Still, it's a start, and it raises very interesting questions in math and cryptography. The plaintext is not simply divided into LEFT and RIGHT like we described. It is permuted first by a so-called Initial Permutation, IP, that transposes the rows and columns of a bit-byte matrix in an odd way. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 ----------------------- 1 2 3 4 take out order for LEFT 5 6 7 8 take out order for RIGHT In other words, transcribe the bytes in row order, then take bits out in the column order indicated, from bottom to top. This permutation is bemusing. There is no apparent reason for it, and many cryptanalysts believe IP has no cryptographic significance. The American Cryptogram Association, an organization of amateur cryptanalysts, has considered IP without being able to reach a conclusion about it. This unexplained permutation does something distinctive, however. Let's look at eight consecutive bytes in a different way: x..x x..x x..x x..x x..x x..x x..x x..x x..x x..x x..x x..x x..x x..x x..x x..x What we are doing is distinguishing between nibble boundary bits and nibble inner bits. Now, it is a fact that IP rearranges boundary and inner bits like this .... .... x..x x..x .... .... x..x x..x x..x x..x .... .... x..x x..x .... .... In other words, half the normal boundary bits are exchanged with inner bits, and the original boundary bits that still remain boundary bits are relocated. The reason for this is obvious - the boundary bits have quite pronounced statistics in normal natural language text, especially in EBCDIC. For example, a blank is 0 1 0 0 0 0 0 0. The result is that 20 percent of all boundary bits are 0 because blanks make up 20 percent of all text. By continuing to separate the boundary nibbles by their natural language statistics, you can derive a frequency table, in addition to 0..0, for 0..1 1..0 1..1 Do this as an exercise, and you will see that the frequencies are quite pronounced. Use EBCDIC; remember, the inventors of the DES were thinking IBM, not micros and ASCII. The effect of IP is to even out these frequencies for the boundary bits so that they are more nearly uniform. It is impossible to believe that something as remarkable as this was not intended. This smoothing out of boundary bit statistics must be the purpose of IP. How come? Why are the boundary bits so important that they are evened out at the expense of the inner bits? The answer I suppose is to compensate for the reduction of the DES key from 128 bits to 56. It doesn't take too long studying the DES to realize that its dimensions are wrong. It is very lopsided. There are eight boxes of four rows and 16 columns each. In short time, you get the feeling that 16 boxes of 16 rows and columns each would have been more natural. The key should have been 128 bits; LEFT and RIGHT should have consisted of 64 bits each. The inter-nibble dependency should have been TWO bits from the previous nibble (not one) concatenated to the current nibble, concatenated to TWO bits of the following nibble. In other words, instead of 32 1 2 3 4 5, it should have been 31 32 1 2 3 4 5 6. Bits 31 32 5 6 should have been the row coordinate into the first S-box, not 32 and 5. The result of this brutal reduction is that there are far less rows per nibble to select values out of than there are columns. Guessing which row is selected, rather than guessing which value, there are (2^2)^8 = 2^16 choices per round, or for one nibble for all rounds. This is not a formidable number. For known input, correctly guessing a row gives you two bits of the key. Because of the key bit consistency requirement, it also helps determine other key bits. Also, guessing the row helps determine the column as well, since the selfsame message row bits participate as column selection bits. It looks like row selection is the weakest part of the DES, and if a cryptanalytic attack is achievable, it should be the spot to concentrate on. My guess is that a strong bias of nibble boundary bits might help such an attack, and that IP is a stop-gap measure to thwart it. If the rows *can* be determined, 32 key bits are recovered, and the DES falls. The remaining 24 bits could not resist even a microcomputer. Perhaps IP is not a bemusing oddity. Perhaps it is symptomatic of a deep weakness born of the DES's butchering. By the way, this is exactly how cryptanalysis works - find a weak point and hammer it. --- This note will describe one more implementation trick for very fast software DES, then recap implementation tricks. You will remember how the 32 bits of RIGHT (or LEFT) must be expanded before they can be combined with the key schedule for a round. The following bits 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 must be expanded to 32 1 2 3 4 5 4 5 6 7 8 9 8 9 10 11 12 13 12 13 14 15 16 17 16 17 18 19 20 21 20 21 22 23 24 25 24 25 26 27 28 26 28 29 30 31 32 1 This expansion is called E in FIPS 46. You will also remember that IP arranges the bits of the plaintext into these 32 bits for RIGHT and 32 bits for LEFT. Now to perform this expansion in code for every round happens to be quite a bit of work, even using a micro's fastest instruction forms. However, IP may be modified to generate LEFT and RIGHT in *expanded* form. Then, if the S-box values are adjusted to compensate for an expanded LEFT and RIGHT, the same encryption is achieved, but without the expense of performing E for every round. Including the S-box adjustment to do away with P, the cost is 48-bit S-box values for every original 4-bit element in the S-boxes. To see how this works, remember that *all* values in the first S-box encrypt only bits 9 17 23 and 31 of either LEFT or RIGHT, and the first S-box's first value is 1 1 1 0, in binary. We therefore replace this value with the following 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Let's assume the 8088 chip and a clock rate of 4.77 MHz, and recap. Setting the key schedules in a one-time initialization call increases software speed from about 80 bytes per second to close to 800. Getting rid of P by precomputing it in the S-boxes increases the speed from about 800 bytes/second to about 1700. Making E a one-time expense by putting it into IP, and adjusting the S-boxes accordingly increases the speed from about 1700 to 2500 bytes/second. There is one more trick if you can live with a c * 64K table, where 'c' is the length of one table entry. You can collapse two S-boxes into a single table, thereby halving the number of S-box lookups. This should bring the speed up to about 5000 bytes/second. However, this large a table isn't practical for some encryption applications. --- The files permf.s and permg.s are the 68000 versions of the permutation routines described in Z80 code in the article "Designing a File Encryption System" in the Aug '84 issue of DDJ. Permf performs the forward permutation of the bits in a 256-bit block as specified by a table of bytes. Permg performs the inverse permutation. The file cycle.s is the 68000 version of the corresponding Z80 code for the routine which "cycles" the permutation tables to the next position. For example, if the permutation table has the values: 1 15 115 57 .... 0 then the forward permutation means to put the 1st bit of the block in the 0th place, the 15th bit in the 1st place, the 115th bit in the 2nd place, and so on until the 0th bit goes in the 255th place. The inverse permutation with the same table means to place the 0th bit of the block in the 1st place, the 1st bit in the 15th place, the 2nd bit in the 115th place, and so on until the 255th bit goes in the 0th place. The routines address bits in the block by deriving a bit index from the byte value of the permutation table. The upper five bits of that value index to the particular byte in the block, and the lower three bits then index to the particular bit within that byte. In the original cryptographic use, the permutation table was assumed to be cycled to its next permutation after the encryption of each block. The idea is to use the same thing in as many different ways as possible to get a long period before it repeats. Consider the following permutation list of 10 elements: element: 7 4 1 3 5 9 8 6 2 0 position: 0 1 2 3 4 5 6 7 8 9 This is the table permf and permg use. In cyclic notation this list becomes: (0 7 6 8 2 1 4 5 9) (3) That is, there are two cycles, one of degree 9 and one singleton. It means that 7 goes to 0, 6 goes to 7, 8 goes to 6, and so on. If we "rotate" the cycles to the second position, we get this list: element: 6 5 4 3 9 0 2 8 1 7 position: 0 1 2 3 4 5 6 7 8 9 Thus from one cycle table we can get 9 different permutation lists. The cycle routine constructs these lists for permf and permg, given a table in cyclic notation as above. It can handle tables of 256 elements, each of which may contain a number of cycles. For example, if we had two tables, the first containing two cycles, of degrees 83 and 173, and the second containing cycles of degrees 197 and 59, the composite cycle would not repeat until 83 * 173 * 197 * 59 = 1.669 x 10^8 blocks had been processed. The routines now run about four times faster that the Z80 versions. * * PERMF for the 68000. Permute a 256-bit bit vector, BLOCK * by table BITPERM. On call: * * a0 -> BLOCK * a1 -> BITPERM * * On return, a0 -> permuted BLOCK. * * Register usage: * * a2 -> WORK, a 32-byte temporary work area * d0 = byte from BITPERM, shifted to bit index * d1 = index to byte of BLOCK * d2 = rotated bit for masking and inner loop control * d3 = #31, outer loop control * d4 = #$07 immediate masking value * * 5/19/86. Execution time 4.5 ms. * .globl permf,work .text permf: movem.l d0-d4/a0-a3,-(a7) moveq #7,d0 clear work area lea work,a2 -> work move.l a2,a3 copy ptr for later use clrloop: clr.l (a2)+ dbra d0,clrloop move.l a3,a2 retrieve work pointer move #$80,d2 masking bit and inner loop control moveq #31,d3 outer loop control moveq #7,d4 masking value clr d0 keep word clear permloop: move.b (a1)+,d0 get byte from BITPERM move d0,d1 we will need it twice lsr #3,d1 compute byte index in BLOCK and d4,d0 save lower 3 bits for bit index eor d4,d0 reverse bit order for btst btst d0,(a0,d1) is bit on in BLOCK? beq permf1 if so, we must set bit in WORK or.b d2,(a2) set bit in WORK permf1: ror.b #1,d2 shift masking bit bcc permloop next bit of work? addq #1,a2 else, next byte of work dbra d3,permloop do for 32 bytes movloop: move.l (a3)+,(a0)+ move data from WORK to BLOCK dbra d4,movloop use #7 already in d4 movem.l (a7)+,d0-d4/a0-a3 rts all done .end * * PERMG for the 68000. Inversely permute a 256-bit bit * vector by table BITPERM. On call: * * a0 -> BLOCK * a1 -> BITPERM * * On return, a0 -> permuted BLOCK. * * Register usage: * * a2 -> WORK, a 32-byte temporary work area * a3 -> BLOCK * d0 = outer loop control * d1 = inner loop control * d2 = bit counter * d3 = longword from block * d4 = byte from BITPERM * d5 = temporary * d6 = #7 masking value * * 5/19/86. Execution time is 3.3 ms. * .globl permg,work .text permg: movem.l d0-d6/a0-a3,-(a7) moveq #7,d0 clear work area lea work,a2 move.l a2,a3 copy ptr for later use clrloop: clr.l (a2)+ dbra d0,clrloop move.l a0,a3 save a0 ptr for later moveq #7,d0 outer loop control moveq #0,d2 count of bits move d2,d4 need word clear moveq #7,d6 masking value permg1: moveq #31,d1 inner loop control and bit to test move.l (a0)+,d3 get longword from BLOCK bitloop: btst d1,d3 check for bit on beq permg2 if on, set BITPERM[d2] bit in WORK move.b (a1,d2),d4 get byte BITPERM[COUNT] move d4,d5 save for reuse lsr #3,d4 index to byte of WORK and d6,d5 compute bit # in that byte eor d6,d5 reverse bit order bset d5,(a2,d4) set the bit in WORK permg2: addq #1,d2 bump bit count dbra d1,bitloop and do for all bits in this word dbra d0,permg1 do for all words of BLOCK movloop: move.l (a2)+,(a3)+ move WORK to BLOCK dbra d6,movloop use #7 already in d6 movem.l (a7)+,d0-d6/a0-a3 rts all done .end * * CYCLE: Convert a table of permutation cycles to a * permutation list at the Nth permutation. This version * of cycle can deal with a table having any number of * cycles (up to word value) of various degrees. The cyclic * permutation table is RANDP and the permutation list * table is BITPERM. BITPERM may then be used to permute * a block of elements. The procedure is: * * consult the cycle table structure to obtain the number * and degree of the cycles and the rotation to be applied * to each * * k <- 0 /* index into RANDP */ * for i = 1 to number_of_cycles do * get degree_of_this_cycle from structure * cycle_base <- k * for j = 1 to degree_of_this_cycle do * BITPERM[RANDP[k]] <- RANDP[RANDP[(k - cycle_base + rotation) * mod (degree_of_this_cycle)]] * k <- k + 1 * end for * end for * * On call: * a0 -> RANDP * a1 -> BITPERM * a2 -> cycle structure * * On return: * all registers saved * BITPERM is established from RANDP * * The cycle structure is built as follows for each set of tables: * * dc.w xx number of cycles in this table * dc.w yy degree of first cycle * dc.w zz rotation of first cycle * . . ..and so forth for each cycle * . . * CAUTION: This structure holds words, but this implementation assumes * that RANDP and BITPERM are tables of BYTES. See indirect addressing * inside degloop below. * * Version of 4/6/86. .text .globl cycle cycle: movem.l d0-d7/a0-a2,-(a7) move (a2)+,d0 get # of cycles subq #1,d0 adjust for dbra clr d3 clear index into RANDP ("k" above) clr d7 clear scratch register cycloop: move (a2)+,d1 get degree of this cycle move (a2)+,d2 get rotation for this cycle move d1,d5 use degree for loop control subq #1,d5 adjust for dbra move d3,d4 set current cycle base = k degloop: move d3,d6 first, see if outside cycle sub d4,d6 RANDP index - current cycle base add d2,d6 add in rotation cmp d1,d6 does that put us outside this cycle? bcs degok branch if not sub d1,d6 else, adjust mod degree degok: add d4,d6 add cycle base back to index + rotation move.b (a0,d6),d6 get RANDP[index + rotation] move.b (a0,d3),d7 get RANDP[index] move.b (a0,d6),(a1,d7) put byte in BITPERM addq #1,d3 bump RANDP index dbra d5,degloop loop until this cycle done dbra d0,cycloop loop until all cycles done movem.l (a7)+,d0-d7/a0-a2 rts .end /*EOF*/