  /
News

# In-depth: IEEE 754 Multiplication And Addition

In this reprinted #altdevblogaday in-depth piece, IGAD student Jasper Bekkers discusses how some floating point operations are implemented, specifically multiplication, addition, and fused-multiple-add. Multiplication The way IEEE 754 multiplication works is identical to how it works for regular scientific notation. Simply multiply the coefficients and add the exponents. However, because this is done in hardware we have some extra constraints, such as overflow and rounding, to take into account. These extra constraints are what make floats appear so 'fuzzy' to some.
1. Check if either of the operands (A and B) are zero (early out)
2. Check for potential exponent overflow and throw corresponding overflow errors
3. Compute sign as Csign = Asign XOR Bsign
4. Compute the exponent Cexponent = Aexponent + Bexponent – 127
5. Compute mantissa Cmantissa = Amantissa * Bmantissa (23-bit integer multiply) and round the result according to the currently set rounding mode.
6. ­­If Cmantissa has overflown, normalize results (Cmantissa <<= 1, Cexponent -= 1)
```f32 float32_mul(f32 a, f32 b) { // extract mantissa, exponent and sign u32 aFrac = a & 0x007FFFFF; u32 bFrac = b & 0x007FFFFF; u32 aExp = (a >> 23) & 0xFF; u32 bExp = (b >> 23) & 0xFF; u32 aSign = a >> 31; u32 bSign = b >> 31; // compute sign bit u32 zSign = aSign ^ bSign; // removed: handle edge conditions where the exponent is about to overflow // see the SoftFloat library for more information // compute exponent u32 zExp = aExp + bExp - 0x7F; // add implicit `1' bit aFrac = (aFrac | 0x00800000) << 7; bFrac = (bFrac | 0x00800000) << 8; u64 zFrac = (u64)aFrac * (u64)bFrac; u32 zFrac0 = zFrac >> 32; u32 zFrac1 = zFrac & 0xFFFFFFFF; // check if we overflowed into more than 23-bits and handle accordingly zFrac0 |= (zFrac1 != 0); if(0 <= (i32)(zFrac0 << 1)) { zFrac0 <<= 1; zExp--; } // reconstruct the float; I've removed the rounding code and just truncate return (zSign << 31) | ((zExp << 23) + (zFrac >> 7)); } Addition Again, the steps for floating point addition are based on calculating with scientific notation. First you align the exponents, then you add the mantissas. The alignment step is the reason for the big inaccuracies with adding small and large numbers together. Align binary pointIf Aexponent > Bexponent Then do Bmantissa >>= 1 until Bmantissa * 2 Bexponent – AexponentIf Bexponent > Aexponent Then do Amantissa >>= 1 until Amantissa * 2 Aexponent – BexponentCompute sum of aligned mantissasAmantissa * 2 Aexponent – Bexponent +BmantissaOr Bmantissa * 2 Bexponent – Aexponent +AmantissaNormalized and round resultsCheck for exponent overflow and throw corresponding overflow errorsIf Cmantissa is zero set the entire float to zero to return a 'correct' 0 float.// implementation only works with a and b of equal sign // if a and b are of different sign, we call float32_sub instead // look at the SoftFloat source-code for specifics. static f32 float32_add(f32 a, f32 b) { int zExp; u32 zFrac; u32 aFrac = a & 0x007FFFFF; u32 bFrac = b & 0x007FFFFF; int aExp = (a >> 23) & 0xFF; int bExp = (b >> 23) & 0xFF; u32 aSign = a >> 31; u32 bSign = b >> 31; u32 zSign = aSign; int expDiff = aExp - bExp; aFrac <<= 6; bFrac <<= 6; // align exponents if needed if(expDiff > 0) { if(bExp == 0) --expDiff; else bFrac |= 0x20000000; bFrac = shift32RightJamming(bFrac, expDiff); zExp = aExp; } else if(expDiff < 0) { if(aExp == 0) ++expDiff; else aFrac |= 0x20000000; aFrac = shift32RightJamming(aFrac, -expDiff); zExp = bExp; } else if(expDiff == 0) { if(aExp == 0) return (zSign << 31) | ((aFrac + bFrac) >> 13); zFrac = 0x40000000 + aFrac + bFrac; zExp = aExp; return (zSign << 31) | ((zExp << 23) + (zFrac >> 7)); } aFrac |= 0x20000000; zFrac = (aFrac + bFrac) << 1; --zExp; if((i32)zFrac < 0) { zFrac = aFrac + bFrac; ++zExp; } // reconstruct the float; I've removed the rounding code and just truncate return (zSign << 31) | ((zExp << 23) + (zFrac >> 7)); } // for reference static u32 shift32RightJamming(int a, int count) { if(count == 0) return a; else if(count < 32) return (a >> count) | ((a << ((-count) & 31)) != 0); else return a != 0; }An overview of floating point addition hardware. The implementation will make a distinction between adding numbers where the exponent differs (the far path) and numbers where the exponent is the same (the close path), much like the implementation above. Fused-multiply-add The multiply-add operation is basically a combination of both of these operations that is as efficient or more efficient to implement in hardware as both operations separately. The primary difference in operation is (as long as it's not a pseudo-fma) is the fact that there is only one rounding operation done at the end of the result, instead of one in the multiply and the add circuits (steps 3 and 4 respectively). Some, if not most, SIMD architectures on current-gen platforms are actually built around just the fused-multiply-add and don't have regular multiply or addition hardware (they'll just insert identity constants into one of the three operands) a simple give-away for this is usually that the cycle count for these operations is identical in each case. Further Reading Single precision floating-point format. (2011, June 19). Retrieved Juli 2011, from Wikipedia: http://en.wikipedia.org/wiki/Single_precision_floating-point_format Quinnell, E. C. (2007, May). Floating-Point Fused Multiply-Add Architectures. Retrieved June 2011, from http://repositories.lib.utexas.edu/bitstream/handle/2152/3082/quinnelle60861.pdf Shaadan, D. M. (2000, Januari). Floating Point Arithmetic Using The IEEE 754 Standard Revisited. Retrieved June 2011, from http://meseec.ce.rit.edu/eecc250-winter99/250-1-27-2000.pdf Hauser, J. (2010, June). SoftFloat Retrieved June 2011, from http://www.jhauser.us/arithmetic/SoftFloat.html Giesen, F. (2011, July). Int-multiply-using-floats trickery Retrieved July 2011, from http://pastebin.com/jyT0gTSS [This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]```

### Latest Jobs

#### Manticore Games

San Mateo, California
8.23.22
Senior Software Engineer - Mobile

#### Sony PlayStation

San Diego, California
6.23.22
Sr. Online Programmer

#### The Walt Disney Company

Glendale, California
8.1.22
Associate Marketing Manager - Walt Disney Games

#### Insomniac Games

Burbank, California
8.26.22
Accessibility Design Researcher
More Jobs

#### Game Developer Job Board

Browse open positions across the game industry or recruit new talent for your studio

#### Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox