Cryptography: An Introduction (3rd Edition) Nigel ..., Summaries of Cryptography and System Security

The plot of this story rests on a substitution cipher where the ciphertext characters are taken from an alphabet of 'stick men' in various positions. The method ...

Typology: Summaries

2022/2023

Uploaded on 05/11/2023

jannine
jannine 🇺🇸

4.9

(15)

239 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Cryptography: An Introduction
(3rd Edition)
Nigel Smart
Chapter 3
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Cryptography: An Introduction (3rd Edition) Nigel ... and more Summaries Cryptography and System Security in PDF only on Docsity!

Cryptography: An Introduction

(3rd Edition)

Nigel Smart

CHAPTER 3

Historical Ciphers

Chapter Goals

  • To explain a number of historical ciphers, such as the Caesar cipher, substitution cipher.
  • To show how these historical ciphers can be broken because they do not hide the underlying statistics of the plaintext.
  • To introduce the concepts of substitution and permutation as basic cipher components.
  • To introduce a number of attack techniques, such as chosen plaintext attacks.
    1. Introduction An encryption algorithm, or cipher, is a means of transforming plaintext into ciphertext under the control of a secret key. This process is called encryption or encipherment. We write

c = ek (m),

where

  • m is the plaintext,
  • e is the cipher function,
  • k is the secret key,
  • c is the ciphertext.

The reverse process is called decryption or decipherment, and we write

m = dk (c).

Note, that the encryption and decryption algorithms e, d are public, the secrecy of m given c depends totally on the secrecy of k. The above process requires that each party needs access to the secret key. This needs to be known to both sides, but needs to be kept secret. Encryption algorithms which have this property are called symmetric cryptosystems or secret key cryptosystems. There is a form of cryptography which uses two different types of key, one is publicly available and used for encryption whilst the other is private and used for decryption. These latter types of cryptosystems are called asymmetric cryptosystems or public key cryptosystems, to which we shall return in a later chapter. Usually in cryptography the communicating parties are denoted by A and B. However, often one uses the more user-friendly names of Alice and Bob. But you should not assume that the parties are necessarily human, we could be describing a communication being carried out between two autonomous machines. The eavesdropper, bad girl, adversary or attacker is usually given the name Eve. In this chapter we shall present some historical ciphers which were used in the pre-computer age to encrypt data. We shall show that these ciphers are easy to break as soon as one understands the statistics of the underlying language, in our case English. In Chapter 5 we shall study this relationship between how easy the cipher is to break and the statistical distribution of the underlying plaintext.

37

  1. SHIFT CIPHER 39
  2. Shift Cipher We first present one of the earliest ciphers, called the shift cipher. Encryption is performed by replacing each letter by the letter a certain number of places on in the alphabet. So for example if the key was three, then the plaintext A would be replaced by the ciphertext D, the letter B would be replaced by E and so on. The plaintext word HELLO would be encrypted as the ciphertext KHOOR. When this cipher is used with the key three, it is often called the Caesar cipher, although in many books the name Caesar cipher is sometimes given to the shift cipher with any key. Strictly this is not correct since we only have evidence that Julius Caesar used the cipher with the key three. There is a more mathematical explanation of the shift cipher which will be instructive for future discussions. First we need to identify each letter of the alphabet with a number. It is usual to identify the letter A with the number 0, the letter B with number 1, the letter C with the number 2 and so on until we identify the letter Z with the number 25. After we convert our plaintext message into a sequence of numbers, the ciphertext in the shift cipher is obtained by adding to each number the secret key k modulo 26, where the key is a number in the range 0 to 25. In this way we can interpret the shift cipher as a stream cipher, with key stream given by the repeating sequence

k, k, k, k, k, k,...

This key stream is not very random, which results in it being easy to break the shift cipher. A naive way of breaking the shift cipher is to simply try each of the possible keys in turn, until the correct one is found. There are only 26 possible keys so the time for this exhaustive key search is very small, particularly if it is easy to recognize the underlying plaintext when it is decrypted. We shall show how to break the shift cipher by using the statistics of the underlying language. Whilst this is not strictly necessary for breaking this cipher, later we shall see a cipher that is made up of a number of shift ciphers applied in turn and then the following statistical technique will be useful. Using a statistical technique on the shift cipher is also instructive as to how statistics of the underlying plaintext can arise in the resulting ciphertext. Take the following example ciphertext, which since it is public knowledge we represent in blue. GB OR, BE ABG GB OR: GUNG VF GUR DHRFGVBA: JURGURE ’GVF ABOYRE VA GUR ZVAQ GB FHSSRE GUR FYVATF NAQ NEEBJF BS BHGENTRBHF SBEGHAR, BE GB GNXR NEZF NTNVAFG N FRN BS GEBHOYRF, NAQ OL BCCBFVAT RAQ GURZ? GB QVR: GB FYRRC; AB ZBER; NAQ OL N FYRRC GB FNL JR RAQ GUR URNEG-NPUR NAQ GUR GUBHFNAQ ANGHENY FUBPXF GUNG SYRFU VF URVE GB, ’GVF N PBAFHZZNGVBA QRIBHGYL GB OR JVFU’Q. GB QVR, GB FYRRC; GB FYRRC: CREPUNAPR GB QERNZ: NL, GURER’F GUR EHO; SBE VA GUNG FYRRC BS QRNGU JUNG QERNZF ZNL PBZR JURA JR UNIR FUHSSYRQ BSS GUVF ZBEGNY PBVY, ZHFG TVIR HF CNHFR: GURER’F GUR ERFCRPG GUNG ZNXRF PNYNZVGL BS FB YBAT YVSR; One technique of breaking the previous sample ciphertext is to notice that the ciphertext still retains details about the word lengths of the underlying plaintext. For example the ciphertext letter N appears as a single letter word. Since the only single letter words in English are A and I we can conclude that the key is either 13, since N is thirteen letters on from A in the alphabet, or the key is equal to 5, since N is five letters on from I in the alphabet. Hence, the moral here is to always remove word breaks from the underlying plaintext before encrypting using the shift

40 3. HISTORICAL CIPHERS

cipher. But even if we ignore this information about the words we can still break this cipher using frequency analysis. We compute the frequencies of the letters in the ciphertext and compare them with the fre- quencies obtained from English which we saw in Fig. 1. We present the two bar graphs one above each other in Fig. 2 so you can see that one graph looks almost like a shift of the other graph. The statistics obtained from the sample ciphertext are given in blue, whilst the statistics obtained from the underlying plaintext language are given in red. Note, we do not compute the red statistics from the actual plaintext since we do not know this yet, we only make use of the knowledge of the underlying language.

Figure 2. Comparison of plaintext and ciphertext frequencies for the shift cipher example

A B C D E F G H I J K L M N O P Q^ R S T U V W X Y Z

A B C D E F G H I J K L M N O P Q^ R S T U V W X Y

✟✟✟✟

✟✟✟✟ Z

By comparing the two bar graphs in Fig. 2 we can see by how much we think the blue graph has been shifted compared with the red graph. By examining where we think the plaintext letter E may have been shifted, one can hazard a guess that it is shifted by one of

2 , 9 , 13 or 23.

Then by trying to deduce by how much the plaintext letter A has been shifted we can guess that it has been shifted by one of 1 , 6 , 13 or 17.

The only shift value which is consistent appears to be the value 13, and we conclude that this is the most likely key value. We can now decrypt the ciphertext, using this key. This reveals, that the underlying plaintext is: To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep; No more; and by a sleep to say we end The heart-ache and the thousand natural shocks That flesh is heir to, ’tis a consummation Devoutly to be wish’d. To die, to sleep; To sleep: perchance to dream: ay, there’s the rub; For in that sleep of death what dreams may come

42 3. HISTORICAL CIPHERS

WONOJTL UOKSTAIWUW YVJ GONOLVCIAD TAG WZCCVJXIAD OAXJOCJOAOZJITL WXZGOAXW TAG WXTYY, TAG TIUW XV CLTQ T WIDAIYIKTAX JVLO IA XSO GONOLVCUOAX VY SIDS-XOKSAVLVDQ IAGZWXJQ IA XSO JODIVA. XSO GOCTJXUOAX STW T LTJDO CJVDJTUUO VY JOWOTJKS WZCCVJXOG MQ IAGZWXJQ, XSO OZJVCOTA ZAIVA, TAG ZE DVNOJAUOAX JOWOTJKS OWXTMLIW- SUOAXW TAG CZMLIK KVJCVJTXIVAW. T EOQ OLOUOAX VY XSIW IW XSO WXJ- VAD LIAEW XSTX XSO GOCTJXUOAX STW HIXS XSO KVUCZXOJ, KVUUZAIKTXIVAW, UIKJVOLOKXJVAIKW TAG UOGIT IAGZWXJIOW IA XSO MJIWXVL JODIVA. XSO TKT- GOUIK JOWOTJKS CJVDJTUUO IW VJDTAIWOG IAXV WONOA DJVZCW, LTADZTDOW TAG TJKSIXOKXZJO, GIDIXTL UOGIT, UVMILO TAG HOTJTMLO KVUCZXIAD, UTK- SIAO LOTJAIAD, RZTAXZU KVUCZXIAD, WQWXOU NOJIYIKTXIVA, TAG KJQCXVD- JTCSQ TAG IAYVJUTXIVA WOKZJIXQ.

We can compute the following frequencies for single letters in the above ciphertext: Letter Freq Letter Freq Letter Freq A 8.6995 B 0.0000 C 3. D 3.1390 E 0.2690 F 0. G 3.6771 H 0.6278 I 7. J 7.0852 K 4.6636 L 3. M 0.8968 N 1.0762 O 11. P 0.1793 Q 1.3452 R 0. S 3.5874 T 8.0717 U 4. V 7.2645 W 6.6367 X 8. Y 1.6143 Z 2.

In addition we determine that the most common bigrams in this piece of ciphertext are

TA, AX, IA, VA, WX, XS, AG, OA, JO, JV,

whilst the most common trigrams are

OAX, TAG, IVA, XSO, KVU, TXI, UOA, AXS. Since the ciphertext letter O occurs with the greatest frequency, namely 11.479, we can guess that the ciphertext letter O corresponds to the plaintext letter E. We now look at what this means for two of the common trigrams found in the ciphertext

  • The ciphertext trigram OAX corresponds to E * *.
  • The ciphertext trigram XSO corresponds to * * E.

We examine similar common similar trigrams in English, which start or end with the letter E. We find that three common ones are given by ENT, ETH and THE. Since the two trigrams we wish to match have one starting with the same letter as the other finishes with, we can conclude that it is highly likely that we have the correspondence

  • X = T,
  • S = H,
  • A = N.

Even after this small piece of analysis we find that it is much easier to understand what the underlying plaintext should be. If we focus on the first two sentences of the ciphertext we are trying to break, and we change the letters which we think we have found the correct mappings for, then we obtain: THE MJIWTVL JEDIVN HTW VNE VY EZJVCE’W LTJDEWT KVNKENTJTTIV NW VY HIDH TEKHNVLVDQ INGZWTJQ. KVUCZTEJW, KVUUZNIKTTIVNW TNG UIKJVELEKTJVNIKW

  1. SUBSTITUTION CIPHER 43

TJE HELL JECJEWENTEG, TLVNDWIGE GIDITTL UEGIT, KVUCZTEJ DTUEW TNG ELEKTJVNIK KVUUEJKE. Recall, this was after the four substitutions O = E, X = T, S = H, A = N.

We now cheat and use the fact that we have retained the word sizes in the ciphertext. We see that since the letter T occurs as a single ciphertext letter we must have

T = I or T = A.

The ciphertext letter T occurs with a probability of 8.0717, which is the highest probability left, hence we are far more likely to have

T = A.

We have already considered the most popular trigram in the ciphertext so turning our attention to the next most popular trigram we see that it is equal to TAG which we suspect corresponds to the plaintext AN*. Therefore it is highly likely that G = D, since AND is a popular trigram in English. Our partially decrypted ciphertext is now equal to THE MJIWTVL JEDIVN HAW VNE VY EZJVCE’W LAJDEWT KVNKENTJATIV NW VY HIDH TEKHNVLVDQ INDZWTJQ. KVUCZTEJW, KVUUZNIKATIVNW AND UIKJVELEKTJVNIKW AJE HELL JECJEWENTED, ALVNDWIDE DIDITAL UEDIA, KVUCZTEJ DAUEW AND ELEKTJVNIK KVUUEJKE. This was after the six substitutions O = E, X = T, S = H, A = N, T = A, G = D.

We now look at two-letter words which occur in the ciphertext:

  • IX This corresponds to the plaintext *T. Therefore the ciphertext letter I must be one of the plaintext letters A or I, since the only two-letter words in English ending in T are AT and IT. We already have worked out what the plaintext character A corresponds to, hence we must have I = I.
  • XV This corresponds to the plaintext T*. Hence, we must have V = O.
  • VY This corresponds to the plaintext O*. Hence, the ciphertext letter Y must correspond to one of F, N or R. We already know the ciphertext letter corresponding to N. In the ciphertext the probability of Y occurring is 1.6, but in English we expect F to occur with probability 2.2 and R to occur with probability 6.0. Hence, it is more likely that Y = F.
  • IW This corresponds to the plaintext I*. Therefore, the plaintext character W must be one of F, N, S and T. We already have F, N, T, hence W = S. All these deductions leave the partial ciphertext as THE MJISTOL JEDION HAS ONE OF EZJOCE’S LAJDEST KONKENTJATIONS OF HIDH TEKHNOLODQ INDZSTJQ. KOUCZTEJS, KOUUZNIKATIONS AND UIKJOELEKTJONIKS AJE HELL JECJESENTED, ALONDSIDE DIDITAL UEDIA, KOUCZTEJ DAUES AND ELEKTJONIK KOUUEJKE. This was after the ten substitutions O = E, X = T, S = H, A = N, T = A, G = D, I = I, V = O, Y = F, W = S.
  1. VIGENERE CIPHER` 45

a keystream. Encryption involves adding the plaintext letter to a key letter. Thus if the key is SESAME, encryption works as follows,

THISISATESTMESSAGE SESAMESESAMESESAME LLASUWSXWSFQWWKASI

Again we notice that A will encrypt to a different letter depending on where it appears in the message. But the Vigen`ere cipher is still easy to break using the underlying statistics of English. Once we have found the length of the keyword, breaking the ciphertext is the same as breaking the shift cipher a number of times.

As an example, suppose the ciphertext is given by UTPDHUG NYH USVKCG MVCE FXL KQIB. WX RKU GI TZN, RLS BBHZLXMSNP KDKS; CEB IH HKEW IBA, YYM SBR PFR SBS, JV UPL O UVADGR HRRWXF. JV ZTVOOV YH ZCQU Y UKWGEB, PL UQFB P FOUKCG, TBF RQ VHCF R KPG, OU KFT ZCQU MAW QKKW ZGSY, FP PGM QKFTK UQFB DER EZRN, MCYE, MG UCTFSVA, WP KFT ZCQU MAW KQIJS. LCOV NTHDNV JPNUJVB IH GGV RWX ONKCGTHKFL XG VKD, ZJM VG CCI MVGD JPNUJ, RLS EWVKJT ASGUCS MVGD; DDK VG NYH PWUV CCHIIY RD DBQN RWTH PFRWBBI VTTK VCGNTGSF FL IAWU XJDUS, HFP VHCF, RR LAWEY QDFS RVMEES FZB CHH JRTT MVGZP UBZN FD ATIIYRTK WP KFT HIVJCI; TBF BLDPWPX RWTH ULAW TG VYCHX KQLJS US DCGCW OPPUPR, VG KFDNUJK GI JIKKC PL KGCJ IAOV KFTR GJFSAW KTZLZES WG RWXWT VWTL WP XPXGG, CJ FPOS VYC BTZCUW XG ZGJQ PMHTRAIBJG WMGFG. JZQ DPB JVYGM ZCLEWXR: CEB IAOV NYH JIKKC TGCWXF UHF JZK. WX VCU LD YITKFTK WPKCGVCWIQT PWVY QEBFKKQ, QNH NZTTW IRFL IAS VFRPE ODJRXGSPTC EKWPTGEES, GMCG TTVVPLTFFJ; YCW WV NYH TZYRWH LOKU MU AWO, KFPM VG BLTP VQN RD DSGG AWKWUKKPL KGCJ, XY OPP KPG ONZTT ICUJCHLSF KFT DBQNJTWUG. DYN MVCK ZT MFWCW HTWF FD JL, OPU YAE CH LQ! PGR UF, YH MWPP RXF CDJCGOSF, XMS UZGJQ JL, SXVPN HBG! There is a way of finding the length of the keyword, which is repeated to form the keystream, called the Kasiski test. First we need to look for repeated sequences of characters. Recall that English has a large repetition of certain bigrams or trigrams and over a long enough string of text these are likely to match up to the same two or three letters in the key every so often. By examining the distance between two repeated sequences we can guess the length of the keyword. Each of these distances should be a multiple of the keyword, hence taking the greatest common divisor of all distances between the repeated sequences should give a good guess as to the keyword length. Let us examine the above ciphertext and look for the bigram WX. The gaps between some of the occurrences of this bigram are 9, 21, 66 and 30, some of which may have occurred by chance, whilst some may reveal information about the length of the keyword. We now take the relevant greatest common divisors to find,

gcd(30, 66) = 6, gcd(3, 9) = gcd(9, 66) = gcd(9, 30) = gcd(21, 66) = 3.

We are unlikely to have a keyword of length three so we conclude that the gaps of 9 and 21 occurred purely by chance. Hence, our best guess for the keyword is that it is of length 6. Now we take every sixth letter and look at the statistics just as we did for a shift cipher to deduce the first letter of the keyword. We can now see the advantage of using the histograms to break the shift cipher earlier. If we used the naive method and tried each of the 26 keys in turn we

46 3. HISTORICAL CIPHERS

could still not detect which key is correct, since every sixth letter of an English sentence does not produce an English sentence. Using our earlier histogram based method is more efficient in this case.

Figure 3. Comparison of plaintext and ciphertext frequencies for every sixth letter of the Vigen`ere example, starting with the first letter

A B C D E F G H I J K L M N O P Q^ R S T U V W X Y Z

A B C D E F G H I J K L M N O P Q^ R S T U V W X Y Z

Figure 4. Comparison of plaintext and ciphertext frequencies for every sixth letter of the Vigen`ere example, starting with the second letter

A B C D E F G H I J K L M N O P Q^ R S T U V W X Y Z

A B C D E F G H I J K L M N O P Q^ R S T U V W X Y

✟✟✟✟

✟✟✟✟

✟✟✟✟

✟✟✟✟

Z

The relevant bar charts for every sixth letter starting with the first are given in Fig. 3. We look for the possible locations of the three peaks corresponding to the plaintext letters A, E and T. We see that this sequence seems to be shifted by two positions in the blue graph compared with the red graph. Hence we can conclude that the first letter of the keyword is C, since C corresponds to a shift of two. We perform a similar step for every sixth letter, starting with the second one. The resulting bar graphs are given in Fig. 4. Using the same technique we find that the blue graph appears to

48 3. HISTORICAL CIPHERS

abcdefghijklmnopqrstuvwxyz,

to obtain the ciphertext

cadbehfigjmknlorpsqtwuxvyz.

We can then deduce that the permutation looks something like ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15... 2 4 1 3 5 7 9 6 8 10 12 14 11 13 15...

We see that the sequence repeats (modulo 5) after every five steps and so the value of n is prob- ably equal to five. We can recover the key by simply taking the first five columns of the above permutation.

Chapter Summary

  • Many early ciphers can be broken because they do not successfully hide the underlying statistics of the language.
  • Important principles behind early ciphers are those of substitution and permutation.
  • Ciphers can either work on blocks of characters via some keyed algorithm or simply consist of adding some keystream to each plaintext character.
  • Ciphers which aimed to get around these early problems often turned out to be weaker than expected, either due to some design flaw or due to bad key management practices adopted by operators.

Further Reading

The best book on the history of ciphers is that by Kahn. Kahn’s book is a weighty tome so those wishing a more rapid introduction should consult the book by Singh. The book by Churchhouse also gives an overview of a number of historical ciphers.

R. Churchhouse. Codes and Ciphers. Julius Caesar, the Enigma and the Internet. Cambridge University Press, 2001.

D. Kahn. The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet. Scribner, 1996.

S. Singh. The Codebook: The Evolution of Secrecy from Mary, Queen of Scots to Quantum Cryp- tography. Doubleday, 2000.