This section should wrap us up on binary for now.
Floating Point Numbers
Up until this point, we have dealt exclusively with integer representation binary. Everything we have done has been a whole value, with no fractions.
Floating point numbers are those fractions. More accurately, it is a number whose values can be between one and zero. The floating point value is made of three components: the sign, the exponent, and the mantissa.
The Sign is a single bit, indicating positive or negative. As before, zero is positive, and one is negative.
The Mantissa is the portion that stores our binary value. If we had the decimal value
5.833 * 10^3
the mantissa would be 5.833. The same holds true in binary arithmetic.
The Exponent is the portion that shifts our mantissa up and down the scale. In the above decimal, the exponent is 3. Note that we do not have to store the “10^” portion of the equation, as that is understood.
Converting to Float
When we convert from decimal to floating point, we use the following steps:
- Convert the integer portion to binary (as before)
- Convert the fraction portion to binary
- Combine the integer portion and the fraction
- Shift until there is only one 1 in front of the decimal point
- Record the shift (exponent) and the value after the decimal.
As our example, we will convert the value 5.833 * 10 ^ 1 into binary. When we expand the decimal, we find the value to be 58.33.
Step one is the same as we have done before. This should need no further explanation at this point:
58 -> 111010
Step two is a bit trickier. In order to convert fractions to decimal, we follow these simple steps:
- Multiply the fraction by two
- Record the value left of the decimal
- Repeat steps 1 and 2 on the value to the right of the decimal, until the value is 0 (or you give up)
Using this process, we can do the following:
.33 * 2 = 0.66
.66 * 2 = 1.32
.32 * 2 = 0.64
.64 * 2 = 1.28
.28 * 2 = 0.56
.56 * 2 = 1.12
.12 * 2 = 0.24
.24 * 2 = 0.48
.48 * 2 = 0.96
.96 * 2 = 1.92
We will stop here, but you can see that we have reduced it to approximately
.33 = 0101010001
This part is easy – we simply combine the two portions we have:
58.33 = 111010.0101010001
Now we have to shift the values. In this case, because we have a non-zero integer left of the decimal, we will shift left.
This gives us an exponent value of 0101.
IMPORTANT: we must have a sign bit in the exponent, to tell us whether we shifted left or right. In this case, the sign bit is 0, because we shifted left. If we had shifted right, the sign bit would be 1.
Finally, we store our final value.
0 0101 110100101010001
And there we have it. In order, we have our sign (0), exponent (0101) and mantissa (110100101010001).
You may have noticed that we’re missing a number. We did not store the 1 to the left of the decimal.
Why? Because we assume it’s there. The computer is designed to put the 1 back when it’s time to work. This saves us one bit, which we use for our sign. Genius, no?
Single and Double Precision
C-based programmers may have seen the data types float and double used in example code. These correlate to the IEEE standards for single and double precision floating point numbers, respectively.
Single precision (float) values are 32-bits long. This is comprised of:
1 sign bit
8 exponent bits
23 mantissa bits
For most cases, this is enough precision, as we can store values at 24 significant bits (equivalent to ~8 significant digits) with…enormous sizes. (Go ahead. Calculate 2^128. We’ll wait.)
When we need more precision, or more size, we employ double precision (double) values. These 64-bit strings are twice the size of single-precision values, comprised of:
1 sign bit
11 exponent bits
52 mantissa bits
As you can see, this vastly increases our precision (equivalent to ~16 significant digits) and gives us an even larger range of sizes.
Number of Bits: Shorthand
This section is too short to get its own post, but needs to be explained. When we talk about the size of registers, RAM, and storage media, we are talking in large quantities of bits. The following is a list of short-hand ways to express how many bits we are talking about.
Byte – 8 bits
Word – some number of bytes (4 bytes for 32-bit machines, 8 bytes for 64-bit machines is reasonably standard)
Kilobyte – One thousand(ish) bytes
Megabyte – One million(ish) bytes
Gigabyte – One Billion(ish) bytes
Terabyte – One Trillion(ish) bytes
Petabyte – One Quadrillion(ish) bytes
Exabyte – One Quintillion(ish) bytes
You will notice that I put the “ish” qualifier behind every value. That’s because we have an estimating system when we convert between binary and decimal.
If you have practiced counting by powers of two as you should, you might have noticed that 2^10 = 1024. If you tried counting much higher, you might also have noticed that 2^20 = 1048576.
Do you see the system yet? If not, let me spell it out:
2^10 ~= 10^3
Two to the tenth is approximately equal to ten to the third
10^6 ~= 2^20
10^9 ~= 2^30
10^3n ~= 2^10n
Keep that in mind. I guarantee it will come in handy in your CS career.