

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
How to do floating point numbers in binary
Typology: Summaries
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Computers can only store 1s and 0s (binary digits). That's fine for whole numbers like 42 or 255 — but what about 6.75 or −0.001? This document explains the trick computers use: floating point representation.
1. Quick Revision — Binary & Powers of Two You already know that binary whole numbers work like this: Binary Calculation Decimal value (^1101) 1×8 + 1×4 + 0×2 + 1×1 13 (^1010) 1×8 + 0×4 + 1×2 + 0×1 10 (^11111111) 128+64+32+16+8+4+2+1 255 The same idea extends to the right of the binary point (like a decimal point, but for binary). Each position to the right is half the previous one: Position Value 1s place (2⁰) 1 ½s place (2 ¹)⁻ 0. ¼s place (2 ²)⁻ 0. ⅛s place (2 ³)⁻ 0. 1/16s place (2 ⁴)⁻ 0. So 101.11 in binary means: 4 + 0 + 1 + 0.5 + 0.25 = 5.75 in decimal. Easy! The challenge is storing this efficiently using a fixed number of bits. 2. The Three Parts of a Floating Point Number The most common floating point standard is IEEE 754 single precision , which uses exactly 32 bits split into three fields: Sign Exponent Mantissa (Fraction) 1 bit 8 bits 23 bits Sign (1 bit): 0 = positive, 1 = negative. Simple! Exponent (8 bits): Controls how big or small the number is — like the power in scientific notation. A bias of 127 is added before storing, so the stored value is always positive.
Mantissa / Fraction (23 bits): Stores the significant digits of the number. There is always a hidden leading 1 that isn't stored (called the implicit leading bit ), so we effectively get 24 bits of precision. 💡 Scientific Notation Analogy Think of it like scientific notation in decimal: 6.022 × 10²³. Here 6.022 is the mantissa and 23 is the exponent. Floating point does exactly the same thing — but in binary: 1.mantissa × 2^exponent.
3. Step-by-Step: Converting 6.75 to IEEE 754 Let's convert 6.75 into its 32-bit IEEE 754 binary representation, one step at a time. 1 Convert the whole part to binary Divide 6 by 2 repeatedly: 6 → 3 → 1 → 0, reading remainders upward: 110 2 Convert the fractional part to binary Multiply 0.75 by 2: 0.75 × 2 = 1 .5 → first bit: 1 Multiply 0.5 by 2: 0.5 × 2 = 1 .0 → second bit: 1 (remainder 0, stop) So 0.75 in binary =. 3 Combine and write in binary scientific notation Full binary: 110.11 → normalise to 1.1011 × 2² (shift point left 2 places) 4 Fill in the three fields Sign: 6.75 is positive → 0 Exponent: Power is 2, add bias 127: 2 + 127 = 129 → 10000001 Mantissa: Digits after the leading 1 in 1.1011 are 1011 , padded to 23 bits: 10110000000000000000000 Putting it all together: Sign Exponent (129) Mantissa 0 10000001 10110000000000000000000 ✓ Check your answer Full 32-bit pattern: 0 10000001 10110000000000000000000. To verify: sign = +, exponent = 129−127 = 2, mantissa = 1.1011 ₂= 1 + ½ + ¼ + 0 + ⅛ ... wait — 1.1011 = 1 + 0.