git.haldean.org docstore / 001aa46
Encoding and intro written Will Brown 9 years ago
1 changed file(s) with 70 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
0 The math of arbitrary-sized integers
1 ====
2
3 Numbers in programming languages are usually a fixed size; an integer is usually
4 32 bits long, a "long integer" 64, etc. This isn't useful for lots of
5 applications – especially science and cryptography – and so most
6 programming langauges provide a mechanism for creating arbitrarily large
7 numbers. Arbitrary-sized integers (also called "big numbers" or "bignums") are
8 numbers whose size is bounded only by the amount of memory available to your
9 program, at the expense of efficiency; arithmetic on bignums is not as fast as
10 arithmetic on native integral types, because they're not implemented in
11 hardware. Bignums are so useful that many programming languages including Python
12 and Lisp use bignums as the default integral type. In this article, I'll write
13 about some of the math behind bignums. Feel free to <a
14 href="will.h.brown+bignum@gmail.com">email me</a> with any questions you may
15 have.
16
17 Encoding bignums
18 ---
19 The first thing we have to settle on is the encoding of our bignums. A lot of
20 introductory courses will have students implement them using an array of single
21 decimal digits, like so:
22
23 8675309 = [8, 6, 7, 5, 3, 0, 9]
24
25 This is inefficient, though, as we don't have an easy way to store numbers less
26 than 10 compactly; the smallest data type we have at our disposal is usually a
27 `byte`, which has 8 bits of storage. Single digits never need more than 4. This
28 encoding, then, uses about twice as much storage as it needs; the first four
29 bits of every digit in the array will always be zero.
30
31 We can extend this idea to be space-efficient, though. Let's stick with the
32 array of integers, but instead of saying that each is one decimal digit, let's
33 say that each is one digit in base 4294967296 (that's 2^32). Each element in the
34 array can then use all of the storage provided by a 32-bit `int`. For future
35 sections, we'll call this list $c$, and elements in the list will be $c\_i$,
36 where $c\_0$ is the least significant "digit". Given $c$, we can find the number
37 it represents (let's call it $X$) using the following formula:
38
39 $$
40 X = \Sigma\_{i = 0}^N (2^{32})^i c\_i
41 $$
42
43 The idea here is that, for each element in $c$, we take the power of 2^32 its
44 associated with, and multiply it by the element. Then we add these all up. It
45 may be more clear why this works if we consider it with 2^32 replaces by 10.
46 Then, for a bignum `[3, 4, 2]`, we get the following:
47
48 10^2 * 3 + 10^1 * 4 + 10^0 * 2 = 300 + 40 + 2 = 342
49
50 Above, we're doing the same, but in base 2^32 instead.
51
52 Getting $c$ from $X$ is a bit harder to express so succinctly; an algorithm to
53 do it can be expressed in pseudo-C as:
54
55 for (i = 0; X > 0; i++) {
56 c[i] = X % 2^32;
57 X /= 2^32;
58 }
59
60 Here, we find the least significant digit using the modulo operator, store it in
61 `c`, take that digit off, and repeat. I'm assuming that we're using integer
62 division here, where the fractional part is truncated, so `3 / 4 = 0`.
63
64 Alright. Now we have a way to encode our bignums. Now we need to do something
65 interesting with them. Let's start with...
66
67 Addition
68 ---
69 Now let's get down to business.