Encoding and intro written
Will Brown
9 years ago

0 | The math of arbitrary-sized integers | |

1 | ==== | |

2 | ||

3 | Numbers in programming languages are usually a fixed size; an integer is usually | |

4 | 32 bits long, a "long integer" 64, etc. This isn't useful for lots of | |

5 | applications – especially science and cryptography – and so most | |

6 | programming langauges provide a mechanism for creating arbitrarily large | |

7 | numbers. Arbitrary-sized integers (also called "big numbers" or "bignums") are | |

8 | numbers whose size is bounded only by the amount of memory available to your | |

9 | program, at the expense of efficiency; arithmetic on bignums is not as fast as | |

10 | arithmetic on native integral types, because they're not implemented in | |

11 | hardware. Bignums are so useful that many programming languages including Python | |

12 | and Lisp use bignums as the default integral type. In this article, I'll write | |

13 | about some of the math behind bignums. Feel free to <a | |

14 | href="will.h.brown+bignum@gmail.com">email me</a> with any questions you may | |

15 | have. | |

16 | ||

17 | Encoding bignums | |

18 | --- | |

19 | The first thing we have to settle on is the encoding of our bignums. A lot of | |

20 | introductory courses will have students implement them using an array of single | |

21 | decimal digits, like so: | |

22 | ||

23 | 8675309 = [8, 6, 7, 5, 3, 0, 9] | |

24 | ||

25 | This is inefficient, though, as we don't have an easy way to store numbers less | |

26 | than 10 compactly; the smallest data type we have at our disposal is usually a | |

27 | `byte`, which has 8 bits of storage. Single digits never need more than 4. This | |

28 | encoding, then, uses about twice as much storage as it needs; the first four | |

29 | bits of every digit in the array will always be zero. | |

30 | ||

31 | We can extend this idea to be space-efficient, though. Let's stick with the | |

32 | array of integers, but instead of saying that each is one decimal digit, let's | |

33 | say that each is one digit in base 4294967296 (that's 2^32). Each element in the | |

34 | array can then use all of the storage provided by a 32-bit `int`. For future | |

35 | sections, we'll call this list $c$, and elements in the list will be $c\_i$, | |

36 | where $c\_0$ is the least significant "digit". Given $c$, we can find the number | |

37 | it represents (let's call it $X$) using the following formula: | |

38 | ||

39 | $$ | |

40 | X = \Sigma\_{i = 0}^N (2^{32})^i c\_i | |

41 | $$ | |

42 | ||

43 | The idea here is that, for each element in $c$, we take the power of 2^32 its | |

44 | associated with, and multiply it by the element. Then we add these all up. It | |

45 | may be more clear why this works if we consider it with 2^32 replaces by 10. | |

46 | Then, for a bignum `[3, 4, 2]`, we get the following: | |

47 | ||

48 | 10^2 * 3 + 10^1 * 4 + 10^0 * 2 = 300 + 40 + 2 = 342 | |

49 | ||

50 | Above, we're doing the same, but in base 2^32 instead. | |

51 | ||

52 | Getting $c$ from $X$ is a bit harder to express so succinctly; an algorithm to | |

53 | do it can be expressed in pseudo-C as: | |

54 | ||

55 | for (i = 0; X > 0; i++) { | |

56 | c[i] = X % 2^32; | |

57 | X /= 2^32; | |

58 | } | |

59 | ||

60 | Here, we find the least significant digit using the modulo operator, store it in | |

61 | `c`, take that digit off, and repeat. I'm assuming that we're using integer | |

62 | division here, where the fractional part is truncated, so `3 / 4 = 0`. | |

63 | ||

64 | Alright. Now we have a way to encode our bignums. Now we need to do something | |

65 | interesting with them. Let's start with... | |

66 | ||

67 | Addition | |

68 | --- | |

69 | Now let's get down to business. |