



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
How computers store and represent numbers using binary code, including integers using the two's complement method and reals as floating point numbers. It also discusses the concept of round-off error and the generation of random numbers using algorithms like the linear congruence algorithm.
Typology: Slides
1 / 7
This page cannot be seen from the preview
Don't miss anything!




The computer memory contains millions of transistors which at any time can be in one of two physical states, usually labelled 0 and 1.
Each transistor carries 1 bit of information.
8 bits = 1 byte 210 bytes = 1 K byte ≡ 1024 bytes 220 bytes = 1 Mega byte 230 bytes = 1 Giga byte 240 bytes = 1 Tera byte
How many bits are needed?
(A) Characters
English: 26 lower case: a,b,c,.... 26 upper case: A,B,C,.... 10 numerals: 1,2,3,.... ≈ 15 ‘extras’ +–.,;:* etc. Total ≈ 77
Given N bits we can create 2N^ different combinations Example: N=2, { 0 , 0 }, { 0 , 1 }, { 1 , 0 }, { 1 , 1 } N=6 gives 2^6 = 64 combinations, too low for CHARACTERS. In fact use 1 byte = 8 bits for each character.
(B) Book
say 40 lines × 80 characters per page, 1 page = 3200 bytes ≈ 3 K bytes. 300 pages ≈ 1 M byte.
(C) Music
To detect a frequency of 20KHz you need 40,000 valves of pressure per second. If each valve is given by 16 bits (CD quality) this amounts to 80 K Bytes / second 10 M Bytes / minute 650 M Bytes / 65 minute CD...about the capacity of a single disc.
(D) Pictures
Each dot on a computer image is called a PIXEL. A good screen might have 1080 × 780 ≈ 1 M PIXELS. Each pixel has colour and brightness specified by (about) 3 bytes. Therefore a single image requires 3 Mb. (A chemical photograph contains approximately 30-40 Mb of information.)
Integers are usually stored using an integer number of bytes, hence one usually refers to 8-bit (see below), 16-bit, 32-bit (default value on many computers) or 64-bit integers. The number of bits controls the range of integers that can be stored, e.g., 8-bits allows 2^8 = 256 combinations, and so allows only 256 integers to be stored. One method for storing (8-bit) integers on the computer, that leads to a convenient binary ‘arithmetic’, is known as the two’s complement method. According to this method, the left-most bit represent − 27 = −128, (i.e. minus the expected value), so that, for example
11010101 = − 1 × 27 + 1 × 26 + 1 × 24 + 1 × 22 + 1 × 20 = − 43
i.e. the left-most bit represents − 27 then 2^6 , 2^5 ,...., 2^0. This allows the range [− 128 , 127] to be stored.
How are these numbers added and subtracted?
Addition
Addition proceeds just like ordinary decimal addition
Example
1001 1101 − 99
Subtraction
For subtraction we exploit the following theorem and then use addition.
Theorem
Suppose f (n) is a function that flips all the bits in n (e.g. f (1001 0111) = 0110 1000) then
−n = f (n) + 1
Proof
f (n) + n = 1111 1111 = − 1 × 27 + 1 × 26 + 1 × 25 + ... + 1 × 20 = −128 + (64 + 32 + 16 + 8 + 4 + 2 + 1)
= − 1 as
∑^ m
i=
2 i^ = 2m+1^ − 1
Therefore a calculation a = b − c
can be rewritten as a = b + (−c) = b + f (c) + 1
This uses only addition and bit flipping, both of which are fast operations. Note, however, that in two’s complement arithmetic we have the unusual results
2 × 64 = − 128 and 1 + 127 = − 128
Consider the sum 1+x for some small number x. For 32-bit reals, x can be as small as 10−^38 (1. 0000000 × 211111111 = 2−^128 = 10−^38 ). BUT 1+x cannot have an exponent of -128. Because 2^0 = 1, it must have exponent = 0. Therefore
1 + x = 1.f 1 f 2 f 3 .......f 22 f 23 × 20
The smallest possible value of x is therefore 2−^23 ≈ 10 −^7 (giving f 1 = f 2 = ... = f 22 = 0, f 23 = 1). Then
1 + x = 1. 00000000000000000000001 × 20
If |x| < 2 −^23 then 1+x=1.
Suppose we want to calculate the summation
(^5000) ∑
n=
n^2
The final 1500 terms in this series are all small < 10 −^7 so if we add the summation FORWARDS (starting) from the first term, they will not contribute to the final answer as there is a 1+x =1 round-off error for every term. However, taken together, the final 1500 terms add to give ≈ 1. 36 × 10 −^4 , a significant error!
Solution One (not entirely foolproof) way to get around this is to add the terms in the summation BACK- WARDS, i.e. from the smallest term first. This will minimise the accumulated round-off error.
Consider a quadratic equation ax^2 + bx + c = 0.
and assume that b > 0 for what follows (although the argument is easily modified). The roots of this equation are
x 1 =
−b −
b^2 − 4 ac 2 a , and x 2 =
−b +
b^2 − 4 ac 2 a
Supposing we have b^2 4 ac. Then
b^2 − 4 ac ≈ b(1 − 2 ac/b^2 ) and
x 1 ≈ −
b a , and x 2 = −
c b
Note that |x 1 | |x 2 | since x 1 /x 2 = b^2 /ac 1.
A problem with round-off error may arise if 4ac < 10 −^7 b^2. Then the computer will calculate b^2 − 4 ac = b^2 and will then calculate x 2 = 0!!!!
Solution A robust quadratic solver proceeds as follows. First calculate
x 1 =
−b −
b^2 − 4 ac 2 a
as normal. Then note that
x 1 x 2 =
−b −
b^2 − 4 ac 2 a
−b +
b^2 − 4 ac 2 a
4 ac 4 a^2
c a
Now recover the ‘problem’ root x 2 from
x 2 =
c ax 1
Clearly if x 1 ≈ −b/a then x 2 ≈ −c/b as it should! We have avoided round-off error.
A computer is an entirely deterministic device, i.e. it does not have access to any genuinely random process. ‘Random’ numbers must therefore be generated from a deterministic sequence - ideally one which ‘appears’ to be random to the casual observer (although of course is not really). ‘Random’ numbers generated in this fashion are adequate for most pratical purposes.
One popular and relatively simple algorithm is as follows:
Ij+1 = (aIj + c) mod m
Example: m = 7, a = 2, c = 3
then I 0 = 5, I 1 = (2 × 5 + 3) mod 7 = 6, I 2 = 1, I 3 = 5, I 4 = 6, etc.
GOOD CHOICE : m = 233280, a = 9301, c = 49297 (Numerical Recipes)
BAD CHOICE : m = 8, a = 2, c = 4 I 0 = 2, I 1 = 0, I 2 = 4, I 3 = 4, I 4 = 4, etc.
In FORTRAN random numbers can be called using the intrinsic subroutine
Call random number(x)
This uses an algorithm chosen by the compiler company. Like the linear congruence algorithm, it will be deterministic, i.e. every time it runs it will give the same random numbers.
To change from run to run, can use
Call random seed()
call random number(x) Z=(i2-i1+1)*x+i
Example: Coin Toss
Want a discrete random variable a that takes the values 0 or 1 with equal probability. Then i1=0, i2-i1+1=2, so
Integer::a call random number(x) a=2*x