







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The plot of this story rests on a substitution cipher where the ciphertext characters are taken from an alphabet of 'stick men' in various positions. The method ...
Typology: Summaries
1 / 13
This page cannot be seen from the preview
Don't miss anything!








c = ek (m),
where
The reverse process is called decryption or decipherment, and we write
m = dk (c).
Note, that the encryption and decryption algorithms e, d are public, the secrecy of m given c depends totally on the secrecy of k. The above process requires that each party needs access to the secret key. This needs to be known to both sides, but needs to be kept secret. Encryption algorithms which have this property are called symmetric cryptosystems or secret key cryptosystems. There is a form of cryptography which uses two different types of key, one is publicly available and used for encryption whilst the other is private and used for decryption. These latter types of cryptosystems are called asymmetric cryptosystems or public key cryptosystems, to which we shall return in a later chapter. Usually in cryptography the communicating parties are denoted by A and B. However, often one uses the more user-friendly names of Alice and Bob. But you should not assume that the parties are necessarily human, we could be describing a communication being carried out between two autonomous machines. The eavesdropper, bad girl, adversary or attacker is usually given the name Eve. In this chapter we shall present some historical ciphers which were used in the pre-computer age to encrypt data. We shall show that these ciphers are easy to break as soon as one understands the statistics of the underlying language, in our case English. In Chapter 5 we shall study this relationship between how easy the cipher is to break and the statistical distribution of the underlying plaintext.
37
k, k, k, k, k, k,...
This key stream is not very random, which results in it being easy to break the shift cipher. A naive way of breaking the shift cipher is to simply try each of the possible keys in turn, until the correct one is found. There are only 26 possible keys so the time for this exhaustive key search is very small, particularly if it is easy to recognize the underlying plaintext when it is decrypted. We shall show how to break the shift cipher by using the statistics of the underlying language. Whilst this is not strictly necessary for breaking this cipher, later we shall see a cipher that is made up of a number of shift ciphers applied in turn and then the following statistical technique will be useful. Using a statistical technique on the shift cipher is also instructive as to how statistics of the underlying plaintext can arise in the resulting ciphertext. Take the following example ciphertext, which since it is public knowledge we represent in blue. GB OR, BE ABG GB OR: GUNG VF GUR DHRFGVBA: JURGURE ’GVF ABOYRE VA GUR ZVAQ GB FHSSRE GUR FYVATF NAQ NEEBJF BS BHGENTRBHF SBEGHAR, BE GB GNXR NEZF NTNVAFG N FRN BS GEBHOYRF, NAQ OL BCCBFVAT RAQ GURZ? GB QVR: GB FYRRC; AB ZBER; NAQ OL N FYRRC GB FNL JR RAQ GUR URNEG-NPUR NAQ GUR GUBHFNAQ ANGHENY FUBPXF GUNG SYRFU VF URVE GB, ’GVF N PBAFHZZNGVBA QRIBHGYL GB OR JVFU’Q. GB QVR, GB FYRRC; GB FYRRC: CREPUNAPR GB QERNZ: NL, GURER’F GUR EHO; SBE VA GUNG FYRRC BS QRNGU JUNG QERNZF ZNL PBZR JURA JR UNIR FUHSSYRQ BSS GUVF ZBEGNY PBVY, ZHFG TVIR HF CNHFR: GURER’F GUR ERFCRPG GUNG ZNXRF PNYNZVGL BS FB YBAT YVSR; One technique of breaking the previous sample ciphertext is to notice that the ciphertext still retains details about the word lengths of the underlying plaintext. For example the ciphertext letter N appears as a single letter word. Since the only single letter words in English are A and I we can conclude that the key is either 13, since N is thirteen letters on from A in the alphabet, or the key is equal to 5, since N is five letters on from I in the alphabet. Hence, the moral here is to always remove word breaks from the underlying plaintext before encrypting using the shift
40 3. HISTORICAL CIPHERS
cipher. But even if we ignore this information about the words we can still break this cipher using frequency analysis. We compute the frequencies of the letters in the ciphertext and compare them with the fre- quencies obtained from English which we saw in Fig. 1. We present the two bar graphs one above each other in Fig. 2 so you can see that one graph looks almost like a shift of the other graph. The statistics obtained from the sample ciphertext are given in blue, whilst the statistics obtained from the underlying plaintext language are given in red. Note, we do not compute the red statistics from the actual plaintext since we do not know this yet, we only make use of the knowledge of the underlying language.
Figure 2. Comparison of plaintext and ciphertext frequencies for the shift cipher example
✟✟✟✟
✟✟✟✟ Z
By comparing the two bar graphs in Fig. 2 we can see by how much we think the blue graph has been shifted compared with the red graph. By examining where we think the plaintext letter E may have been shifted, one can hazard a guess that it is shifted by one of
2 , 9 , 13 or 23.
Then by trying to deduce by how much the plaintext letter A has been shifted we can guess that it has been shifted by one of 1 , 6 , 13 or 17.
The only shift value which is consistent appears to be the value 13, and we conclude that this is the most likely key value. We can now decrypt the ciphertext, using this key. This reveals, that the underlying plaintext is: To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep; No more; and by a sleep to say we end The heart-ache and the thousand natural shocks That flesh is heir to, ’tis a consummation Devoutly to be wish’d. To die, to sleep; To sleep: perchance to dream: ay, there’s the rub; For in that sleep of death what dreams may come
42 3. HISTORICAL CIPHERS
WONOJTL UOKSTAIWUW YVJ GONOLVCIAD TAG WZCCVJXIAD OAXJOCJOAOZJITL WXZGOAXW TAG WXTYY, TAG TIUW XV CLTQ T WIDAIYIKTAX JVLO IA XSO GONOLVCUOAX VY SIDS-XOKSAVLVDQ IAGZWXJQ IA XSO JODIVA. XSO GOCTJXUOAX STW T LTJDO CJVDJTUUO VY JOWOTJKS WZCCVJXOG MQ IAGZWXJQ, XSO OZJVCOTA ZAIVA, TAG ZE DVNOJAUOAX JOWOTJKS OWXTMLIW- SUOAXW TAG CZMLIK KVJCVJTXIVAW. T EOQ OLOUOAX VY XSIW IW XSO WXJ- VAD LIAEW XSTX XSO GOCTJXUOAX STW HIXS XSO KVUCZXOJ, KVUUZAIKTXIVAW, UIKJVOLOKXJVAIKW TAG UOGIT IAGZWXJIOW IA XSO MJIWXVL JODIVA. XSO TKT- GOUIK JOWOTJKS CJVDJTUUO IW VJDTAIWOG IAXV WONOA DJVZCW, LTADZTDOW TAG TJKSIXOKXZJO, GIDIXTL UOGIT, UVMILO TAG HOTJTMLO KVUCZXIAD, UTK- SIAO LOTJAIAD, RZTAXZU KVUCZXIAD, WQWXOU NOJIYIKTXIVA, TAG KJQCXVD- JTCSQ TAG IAYVJUTXIVA WOKZJIXQ.
We can compute the following frequencies for single letters in the above ciphertext: Letter Freq Letter Freq Letter Freq A 8.6995 B 0.0000 C 3. D 3.1390 E 0.2690 F 0. G 3.6771 H 0.6278 I 7. J 7.0852 K 4.6636 L 3. M 0.8968 N 1.0762 O 11. P 0.1793 Q 1.3452 R 0. S 3.5874 T 8.0717 U 4. V 7.2645 W 6.6367 X 8. Y 1.6143 Z 2.
In addition we determine that the most common bigrams in this piece of ciphertext are
TA, AX, IA, VA, WX, XS, AG, OA, JO, JV,
whilst the most common trigrams are
OAX, TAG, IVA, XSO, KVU, TXI, UOA, AXS. Since the ciphertext letter O occurs with the greatest frequency, namely 11.479, we can guess that the ciphertext letter O corresponds to the plaintext letter E. We now look at what this means for two of the common trigrams found in the ciphertext
We examine similar common similar trigrams in English, which start or end with the letter E. We find that three common ones are given by ENT, ETH and THE. Since the two trigrams we wish to match have one starting with the same letter as the other finishes with, we can conclude that it is highly likely that we have the correspondence
Even after this small piece of analysis we find that it is much easier to understand what the underlying plaintext should be. If we focus on the first two sentences of the ciphertext we are trying to break, and we change the letters which we think we have found the correct mappings for, then we obtain: THE MJIWTVL JEDIVN HTW VNE VY EZJVCE’W LTJDEWT KVNKENTJTTIV NW VY HIDH TEKHNVLVDQ INGZWTJQ. KVUCZTEJW, KVUUZNIKTTIVNW TNG UIKJVELEKTJVNIKW
TJE HELL JECJEWENTEG, TLVNDWIGE GIDITTL UEGIT, KVUCZTEJ DTUEW TNG ELEKTJVNIK KVUUEJKE. Recall, this was after the four substitutions O = E, X = T, S = H, A = N.
We now cheat and use the fact that we have retained the word sizes in the ciphertext. We see that since the letter T occurs as a single ciphertext letter we must have
T = I or T = A.
The ciphertext letter T occurs with a probability of 8.0717, which is the highest probability left, hence we are far more likely to have
T = A.
We have already considered the most popular trigram in the ciphertext so turning our attention to the next most popular trigram we see that it is equal to TAG which we suspect corresponds to the plaintext AN*. Therefore it is highly likely that G = D, since AND is a popular trigram in English. Our partially decrypted ciphertext is now equal to THE MJIWTVL JEDIVN HAW VNE VY EZJVCE’W LAJDEWT KVNKENTJATIV NW VY HIDH TEKHNVLVDQ INDZWTJQ. KVUCZTEJW, KVUUZNIKATIVNW AND UIKJVELEKTJVNIKW AJE HELL JECJEWENTED, ALVNDWIDE DIDITAL UEDIA, KVUCZTEJ DAUEW AND ELEKTJVNIK KVUUEJKE. This was after the six substitutions O = E, X = T, S = H, A = N, T = A, G = D.
We now look at two-letter words which occur in the ciphertext:
a keystream. Encryption involves adding the plaintext letter to a key letter. Thus if the key is SESAME, encryption works as follows,
THISISATESTMESSAGE SESAMESESAMESESAME LLASUWSXWSFQWWKASI
Again we notice that A will encrypt to a different letter depending on where it appears in the message. But the Vigen`ere cipher is still easy to break using the underlying statistics of English. Once we have found the length of the keyword, breaking the ciphertext is the same as breaking the shift cipher a number of times.
As an example, suppose the ciphertext is given by UTPDHUG NYH USVKCG MVCE FXL KQIB. WX RKU GI TZN, RLS BBHZLXMSNP KDKS; CEB IH HKEW IBA, YYM SBR PFR SBS, JV UPL O UVADGR HRRWXF. JV ZTVOOV YH ZCQU Y UKWGEB, PL UQFB P FOUKCG, TBF RQ VHCF R KPG, OU KFT ZCQU MAW QKKW ZGSY, FP PGM QKFTK UQFB DER EZRN, MCYE, MG UCTFSVA, WP KFT ZCQU MAW KQIJS. LCOV NTHDNV JPNUJVB IH GGV RWX ONKCGTHKFL XG VKD, ZJM VG CCI MVGD JPNUJ, RLS EWVKJT ASGUCS MVGD; DDK VG NYH PWUV CCHIIY RD DBQN RWTH PFRWBBI VTTK VCGNTGSF FL IAWU XJDUS, HFP VHCF, RR LAWEY QDFS RVMEES FZB CHH JRTT MVGZP UBZN FD ATIIYRTK WP KFT HIVJCI; TBF BLDPWPX RWTH ULAW TG VYCHX KQLJS US DCGCW OPPUPR, VG KFDNUJK GI JIKKC PL KGCJ IAOV KFTR GJFSAW KTZLZES WG RWXWT VWTL WP XPXGG, CJ FPOS VYC BTZCUW XG ZGJQ PMHTRAIBJG WMGFG. JZQ DPB JVYGM ZCLEWXR: CEB IAOV NYH JIKKC TGCWXF UHF JZK. WX VCU LD YITKFTK WPKCGVCWIQT PWVY QEBFKKQ, QNH NZTTW IRFL IAS VFRPE ODJRXGSPTC EKWPTGEES, GMCG TTVVPLTFFJ; YCW WV NYH TZYRWH LOKU MU AWO, KFPM VG BLTP VQN RD DSGG AWKWUKKPL KGCJ, XY OPP KPG ONZTT ICUJCHLSF KFT DBQNJTWUG. DYN MVCK ZT MFWCW HTWF FD JL, OPU YAE CH LQ! PGR UF, YH MWPP RXF CDJCGOSF, XMS UZGJQ JL, SXVPN HBG! There is a way of finding the length of the keyword, which is repeated to form the keystream, called the Kasiski test. First we need to look for repeated sequences of characters. Recall that English has a large repetition of certain bigrams or trigrams and over a long enough string of text these are likely to match up to the same two or three letters in the key every so often. By examining the distance between two repeated sequences we can guess the length of the keyword. Each of these distances should be a multiple of the keyword, hence taking the greatest common divisor of all distances between the repeated sequences should give a good guess as to the keyword length. Let us examine the above ciphertext and look for the bigram WX. The gaps between some of the occurrences of this bigram are 9, 21, 66 and 30, some of which may have occurred by chance, whilst some may reveal information about the length of the keyword. We now take the relevant greatest common divisors to find,
gcd(30, 66) = 6, gcd(3, 9) = gcd(9, 66) = gcd(9, 30) = gcd(21, 66) = 3.
We are unlikely to have a keyword of length three so we conclude that the gaps of 9 and 21 occurred purely by chance. Hence, our best guess for the keyword is that it is of length 6. Now we take every sixth letter and look at the statistics just as we did for a shift cipher to deduce the first letter of the keyword. We can now see the advantage of using the histograms to break the shift cipher earlier. If we used the naive method and tried each of the 26 keys in turn we
46 3. HISTORICAL CIPHERS
could still not detect which key is correct, since every sixth letter of an English sentence does not produce an English sentence. Using our earlier histogram based method is more efficient in this case.
Figure 3. Comparison of plaintext and ciphertext frequencies for every sixth letter of the Vigen`ere example, starting with the first letter
Figure 4. Comparison of plaintext and ciphertext frequencies for every sixth letter of the Vigen`ere example, starting with the second letter
✟✟✟✟
✟✟✟✟
✟✟✟✟
✟✟✟✟
The relevant bar charts for every sixth letter starting with the first are given in Fig. 3. We look for the possible locations of the three peaks corresponding to the plaintext letters A, E and T. We see that this sequence seems to be shifted by two positions in the blue graph compared with the red graph. Hence we can conclude that the first letter of the keyword is C, since C corresponds to a shift of two. We perform a similar step for every sixth letter, starting with the second one. The resulting bar graphs are given in Fig. 4. Using the same technique we find that the blue graph appears to
48 3. HISTORICAL CIPHERS
abcdefghijklmnopqrstuvwxyz,
to obtain the ciphertext
cadbehfigjmknlorpsqtwuxvyz.
We can then deduce that the permutation looks something like ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15... 2 4 1 3 5 7 9 6 8 10 12 14 11 13 15...
We see that the sequence repeats (modulo 5) after every five steps and so the value of n is prob- ably equal to five. We can recover the key by simply taking the first five columns of the above permutation.
The best book on the history of ciphers is that by Kahn. Kahn’s book is a weighty tome so those wishing a more rapid introduction should consult the book by Singh. The book by Churchhouse also gives an overview of a number of historical ciphers.
R. Churchhouse. Codes and Ciphers. Julius Caesar, the Enigma and the Internet. Cambridge University Press, 2001.
D. Kahn. The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet. Scribner, 1996.
S. Singh. The Codebook: The Evolution of Secrecy from Mary, Queen of Scots to Quantum Cryp- tography. Doubleday, 2000.