Introduction to Parallel Processing: Code Breaking

Introduction to Parallel Processing: Code Breaking

Introduction

In the rest of ph21, you are to begin exploring the world of parallel computing! This week’s readings gave you some background on what it’s all about; this assignment will get you working on a practical project.

MPI and Models of Parallel Computation

You have been reading about the MPI (Message Passing Interface) standard, widely used to create portable parallel code. The MPI standard is essentially a collection of several very general functions organized in the form of a standard C/C++/Fortran libraries. These functions do not determine program structure, but they provide the means to set up communication between the processors in the parallel computer. Using the MPI standard you can work in several computational models, characterized by the manner in which the parallel CPUs (henceforth, nodes) communicate. Perhaps the simplest such model is Master–Slave, whereby one of the nodes (the Master) has the special role of distributing tasks, collecting results, and performing other control functions; while the rest of the nodes (the Slaves) execute the tasks assigned by the Master. In this assignment, we will experiment with the Master–Slave model, implementing a parallel program that tries to break a cipher.

Cryptography and code breaking

Within many of the activities, interactions, and transactions that have become possible with networked computers, cellphones and other appliances, security is often a primary concern. The scope of security ranges from requiring authentication to access a private resource (for instance, you provide your password to log in to your account on a Unix cluster), to protecting the transit of sensitive information (such as the number of your credit card, as it is transmitted to an online merchant). Cryptography plays an important role in many of these situations. Its primary purpose is to create procedures to transmit a message over a public channel from the rightful sender to the rightful recipient, in such a way that only the latter can read it; but this basic concept can be turned into several other applications, such as cryptographic signatures of messages, which allow the recipient to make sure that the message really originates from the alleged sender.

An important role in cryptography is played by symmetric encryption algorithms. In these, you provide a clear text C and a key k, and apply a function f to generate the cipher Y = fk(C). Knowledge of the key is necessary to turn the cipher back into the clear text, applying the inverse function f−1: C = f−1 k (Y). So two correspondents in need of secrecy (it is customary to call them Alice and Bob) can share a key over a secure channel (for instance, when they meet face to face), and then encrypt all their messages with that key. Since Alice and Bob are the only ones who can read messages so encrypted, they have successfully set up a secure channel.

A related application is the following: suppose Alice is communicating with somebody who says he is Bob, over a channel (such as an internet chat) where she cannot be sure of the identity of people talking to her. If Alice and Bob have previously shared a secret key, she can send a clear text to (the alleged) Bob, and ask him to encrypt that text and send back the cipher. Then she can check that the cipher decrypts back to the original clear text, making sure that the person on the other side is really Bob, or at least somebody who has the key.

This is how passwords are often implemented in computer systems. When you set your password, it is used as the key to encrypt a standard text (generally, a sequence of eight bytes, all set to zero), producing a cipher that is stored in a password file within the system. Whenever the system needs to check your password (when you want to log in, for instance), you provide your key again, which is used to encrypt the standard text, which in turn is compared to the stored cipher. If you provide the wrong password, the standard text will be encrypted to something different from the stored cipher, and you will be refused access. The payoff to this procedure is that your password can be checked without actually storing it in the system. While the clear text and the cipher are both available to people of ill intentions, it is very hard (if the symmetric encryption scheme is well chosen) to use them to recover your key.

So when hackers want to attack a system, they have to proceed backwards, trying to guess possible keys (passwords), and trying the encryption function on them to see if they yield the correct cipher. That is why it is bad to use your name or a dictionary word as your password: those are the first things that a hacker will try. If the key is just a random sequence of characters, the only resort is brute force: trying all the possible keys. Ideally, the length itself of the key should make such an approach unlikely to succeed (at least within the time that a typical hacker is willing to wait), but times are a-changing, and some commonly used ciphers are now coming within the reach of powerful computers, and especially of big parallel machines.

In the last part of this assignment, you are going to use our parallel cluster to mount a brute force attack against a commonly used symmetric encryption scheme, DES. This scheme uses a 56-bit key to encrypt a 64-bit clear text into a 64-bit cipher. We are going to make things easier for you and our cluster, so we will actually consider only 20-bit keys by setting the last 36 bits to zero.

Assignment

1. Read the handouts provided this week. This is important, because it includes instructions on how to use the Octo cluster. Discuss any doubts on what you have read with your TA.

2. Write a program that will (asynchronously) print the number of the processor it is running on. Execute it on Octo on one, two, four processors.

3. Modify your program in such a way that it prints these statements in the order of increasing node number. There are two ways to do this: have the first node print out its number, then send a message to node two; node two waits for a message from node one, then writes its number, then sends a message to node three; and so on. Or: have node zero (the master) wait for messages from each node (in the correct order), and write the number of that node when the correct message is received.

4. Breaking codes. Create an MPI program that takes as input a clear text and a cipher, and tries to find the DES key that encrypts the first into the second. The clear text will always be a sequence of eight bytes, each equal to zero (as numbers, not as characters). The ciphers for 1024 different DES keys are given in the file /usr/local/src/DES_cyphers. Each numbered line contains an 8-byte cipher, given as a single hexadecimal number.

We are providing you with a function, int keycheck(char *plain,char *cipher,long key), that will return nonzero if the 20-bit key specified in key encrypts the cleartext plain[8] into the cipher cipher[8] (notice that plain text and cipher are both 64 bits, or 8 bytes, long). To use this function, you should compile your C++ code together with /usr/local/src/deslib.c (use something like g++ -I/usr/local/src assignment.c /usr/local/src/deslib.c -o assignment.) The function keycheck is not especially optimized, so on the computers in our cluster it can check about fifty thousand keys per second (that is why we consider a key space of only a million keys). Before using keycheck, you should call desinit(0), and when you are done, desdone(). If you cannot convert the hexadecimal ciphers into arrays of bytes, try to use (repeatedly) something like scanf(”%2x”,&cipher[i]). The first two alphanumeric characters of the hex numbers should go into cipher[0]. If you want to read ciphers from stdin, refer to Reading from stdin.

The MPI root node should act as the Master, subdividing the space of keys (the set of numbers between 0 and 2^20 − 1 = 1,048,575) into sections, and giving them out to the other nodes (the Slaves), which then have the task of checking all those keys (since you are putting several Slaves to work on the keys, it would be a waste of time to check the same ones!). If one of the Slaves finds a key to be the correct one, it will return this result to the Master, which will then proclaim victory and stop the computation. If instead a Slave exhausts all the keys that it was given to check without finding the correct one, it will then wait for the Master to give it more keys to process. You should use the MPI message-passing functions to give out keys to the Slaves, and to report back results.

5. Breaking codes, continued. Experiment using 2, 4, and 8 nodes, and use different sizes for the parcels dispatched to the Slaves for processing. Use several of ciphers provided, and use MPI::Wtime() to measure the time taken to find them. Produce graphs showing how this time scales with the number of nodes and with the parcel size, and comment on the results. In order for your timing results to be accurate, make sure that you run in the queuing system (and that everybody else does), so that you have all the processors to yourself.

6. Breaking codes, randomly. Last, experiment with a different approach, where the Slaves just try random keys (use the random() function from the C standard library) until one of them finds the right one (the Slave will then tell the Master which key is the “winner”, and the Master will stop the computation). Each Slave should be started with a different random number seed, or else they will all be computing the same ’random’ numbers! Of course, the Slaves now run the risk of checking the same key twice! Can you predict what degradation of performance you should expect? What degradation do you observe?