1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 /* 6 7 Multi-precision division. Here be dragons. 8 9 Given u and v, where u is n+m digits, and v is n digits (with no leading zeros), 10 the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v. 11 That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x, 12 and rem = u - quo·v. 13 14 15 Long Division 16 17 Division in a computer proceeds the same as long division in elementary school, 18 but computers are not as good as schoolchildren at following vague directions, 19 so we have to be much more precise about the actual steps and what can happen. 20 21 We work from most to least significant digit of the quotient, doing: 22 23 • Guess a digit q, the number of v to subtract from the current 24 section of u to zero out the topmost digit. 25 • Check the guess by multiplying q·v and comparing it against 26 the current section of u, adjusting the guess as needed. 27 • Subtract q·v from the current section of u. 28 • Add q to the corresponding section of the result quo. 29 30 When all digits have been processed, the final remainder is left in u 31 and returned as rem. 32 33 For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2). 34 35 q₂ q₁ q₀ 36 _________________ 37 v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀ 38 ↓ ↓ ↓ | | 39 [u₄ u₃ u₂]| | 40 - [ q₂·v ]| | 41 ----------- ↓ | 42 [ rem | u₁]| 43 - [ q₁·v ]| 44 ----------- ↓ 45 [ rem | u₀] 46 - [ q₀·v ] 47 ------------ 48 [ rem ] 49 50 Instead of creating new storage for the remainders and copying digits from u 51 as indicated by the arrows, we use u's storage directly as both the source 52 and destination of the subtractions, so that the remainders overwrite 53 successive overlapping sections of u as the division proceeds, using a slice 54 of u to identify the current section. This avoids all the copying as well as 55 shifting of remainders. 56 57 Division of u with n+m digits by v with n digits (in base B) can in general 58 produce at most m+1 digits, because: 59 60 • u < B^(n+m) [B^(n+m) has n+m+1 digits] 61 • v ≥ B^(n-1) [B^(n-1) is the smallest n-digit number] 62 • u/v < B^(n+m) / B^(n-1) [divide bounds for u, v] 63 • u/v < B^(m+1) [simplify] 64 65 The first step is special: it takes the top n digits of u and divides them by 66 the n digits of v, producing the first quotient digit and an n-digit remainder. 67 In the example, q₂ = ⌊u₄u₃u₂ / v⌋. 68 69 The first step divides n digits by n digits to ensure that it produces only a 70 single digit. 71 72 Each subsequent step appends the next digit from u to the remainder and divides 73 those n+1 digits by the n digits of v, producing another quotient digit and a 74 new n-digit remainder. 75 76 Subsequent steps divide n+1 digits by n digits, an operation that in general 77 might produce two digits. However, as used in the algorithm, that division is 78 guaranteed to produce only a single digit. The dividend is of the form 79 rem·B + d, where rem is a remainder from the previous step and d is a single 80 digit, so: 81 82 • rem ≤ v - 1 [rem is a remainder from dividing by v] 83 • rem·B ≤ v·B - B [multiply by B] 84 • d ≤ B - 1 [d is a single digit] 85 • rem·B + d ≤ v·B - 1 [add] 86 • rem·B + d < v·B [change ≤ to <] 87 • (rem·B + d)/v < B [divide by v] 88 89 90 Guess and Check 91 92 At each step we need to divide n+1 digits by n digits, but this is for the 93 implementation of division by n digits, so we can't just invoke a division 94 routine: we _are_ the division routine. Instead, we guess at the answer and 95 then check it using multiplication. If the guess is wrong, we correct it. 96 97 How can this guessing possibly be efficient? It turns out that the following 98 statement (let's call it the Good Guess Guarantee) is true. 99 100 If 101 102 • q = ⌊u/v⌋ where u is n+1 digits and v is n digits, 103 • q < B, and 104 • the topmost digit of v = vₙ₋₁ ≥ B/2, 105 106 then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.) 107 108 That is, if we know the answer has only a single digit and we guess an answer 109 by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division, 110 then that guess is at least as large as the correct answer. It is also not 111 too much larger: it is off by at most two from the correct answer. 112 113 Note that in the first step of the overall division, which is an n-by-n-digit 114 division, the 2-by-1 guess uses an implicit uₙ = 0. 115 116 Note that using a 2-by-1-digit division here does not mean calling ourselves 117 recursively. Instead, we use an efficient direct hardware implementation of 118 that operation. 119 120 Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v. 121 If a guess q̂ is too big, it will not satisfy this test. Viewed a different way, 122 the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is 123 negative, then the guess q̂ is too big. 124 125 This gives us a way to compute q. First compute q̂ with 2-by-1-digit division. 126 Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because 127 q̂ ≤ q+2. 128 129 130 Scaling Inputs 131 132 The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2. 133 For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off 134 because the first digit 1 is smaller than B/2 = 5. 135 136 We can ensure that v has a large top digit by multiplying both u and v by the 137 right amount. Continuing the example, if we multiply both 172 and 19 by 3, we 138 now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough 139 ⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here 140 to multiply by 4, because that can be done with a shift. Specifically, we can 141 always count the number of leading zeros i in the first digit of v and then 142 shift both u and v left by i bits. 143 144 Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will 145 be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder 146 by the scaling factor (shifting right i bits) when we finish. 147 148 Note that these shifts happen before and after the entire division algorithm, 149 not at each step in the per-digit iteration. 150 151 Note the effect of scaling inputs on the size of the possible quotient. 152 In the scaled u/v, u can gain a digit from scaling; v never does, because we 153 pick the scaling factor to make v's top digit larger but without overflowing. 154 If u and v have n+m and n digits after scaling, then: 155 156 • u < B^(n+m) [B^(n+m) has n+m+1 digits] 157 • v ≥ B^n / 2 [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2] 158 • u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v] 159 • u/v < 2 B^m [simplify] 160 161 The quotient can still have m+1 significant digits, but if so the top digit 162 must be a 1. This provides a different way to handle the first digit of the 163 result: compare the top n digits of u against v and fill in either a 0 or a 1. 164 165 166 Refining Guesses 167 168 Before we check whether u < q̂·v, we can adjust our guess to change it from 169 q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋. 170 Although not mentioned above, the Good Guess Guarantee also promises that this 171 3-by-2-digit division guess is more precise and at most one away from the real 172 answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done 173 without n-digit math. 174 175 If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to 176 ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division: 177 if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced. 178 179 Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0, 180 and 181 182 uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂) 183 [splitting off the bottom digit] 184 = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂ 185 [regrouping] 186 187 The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁. 188 If the initial guess returns both q̂ and its remainder r̂, then checking 189 whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂. 190 191 If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by 192 decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂. 193 (As before, this fixup is only needed at most twice.) 194 195 Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one 196 away from the correct q, and we've avoided doing any n-digit math. 197 (If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.) 198 199 The final check u < q̂·v and the possible fixup must be done at full precision. 200 For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess 201 is not often wrong at all. But still we must do the check. Note that since the 202 3-by-2 guess is off by at most 1, it can be convenient to perform the final 203 u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the 204 subtraction underflows, decremeting q̂ and adding one v back to r is enough to 205 arrive at the final q, r. 206 207 That's the entirety of long division: scale the inputs, and then loop over 208 each output position, guessing, checking, and correcting the next output digit. 209 210 For a 2n-digit number divided by an n-digit number (the worst size-n case for 211 division complexity), this algorithm uses n+1 iterations, each of which must do 212 at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of 213 O(n) time each, so O(n²) time overall. 214 215 216 Recursive Division 217 218 For very large inputs, it is possible to improve on the O(n²) algorithm. 219 Let's call a group of n/2 real digits a (very) “wide digit”. We can run the 220 standard long division algorithm explained above over the wide digits instead of 221 the actual digits. This will result in many fewer steps, but the math involved in 222 each step is more work. 223 224 Where basic long division uses a 2-by-1-digit division to guess the initial q̂, 225 the new algorithm must use a 2-by-1-wide-digit division, which is of course 226 really an n-by-n/2-digit division. That's OK: if we implement n-digit division 227 in terms of n/2-digit division, the recursion will terminate when the divisor 228 becomes small enough to handle with standard long division or even with the 229 2-by-1 hardware instruction. 230 231 For example, here is a sketch of dividing 10 digits by 4, proceeding with 232 wide digits corresponding to two regular digits. The first step, still special, 233 must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit 234 remainder less than v. The middle steps divide 6 digits by 4, guaranteed to 235 produce two output digits each (one wide digit) with 4-digit remainders. 236 The final step must use what it has: the 4-digit remainder plus one more, 237 5 digits to divide by 4. 238 239 q₆ q₅ q₄ q₃ q₂ q₁ q₀ 240 _______________________________ 241 v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ 242 ↓ ↓ ↓ ↓ ↓ | | | | | 243 [u₉ u₈ u₇ u₆ u₅]| | | | | 244 - [ q₆q₅·v ]| | | | | 245 ----------------- ↓ ↓ | | | 246 [ rem |u₄ u₃]| | | 247 - [ q₄q₃·v ]| | | 248 -------------------- ↓ ↓ | 249 [ rem |u₂ u₁]| 250 - [ q₂q₁·v ]| 251 -------------------- ↓ 252 [ rem |u₀] 253 - [ q₀·v ] 254 ------------------ 255 [ rem ] 256 257 An alternative would be to look ahead to how well n/2 divides into n+m and 258 adjust the first step to use fewer digits as needed, making the first step 259 more special to make the last step not special at all. For example, using the 260 same input, we could choose to use only 4 digits in the first step, leaving 261 a full wide digit for the last step: 262 263 q₆ q₅ q₄ q₃ q₂ q₁ q₀ 264 _______________________________ 265 v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ 266 ↓ ↓ ↓ ↓ | | | | | | 267 [u₉ u₈ u₇ u₆]| | | | | | 268 - [ q₆·v ]| | | | | | 269 -------------- ↓ ↓ | | | | 270 [ rem |u₅ u₄]| | | | 271 - [ q₅q₄·v ]| | | | 272 -------------------- ↓ ↓ | | 273 [ rem |u₃ u₂]| | 274 - [ q₃q₂·v ]| | 275 -------------------- ↓ ↓ 276 [ rem |u₁ u₀] 277 - [ q₁q₀·v ] 278 --------------------- 279 [ rem ] 280 281 Today, the code in divRecursiveStep works like the first example. Perhaps in 282 the future we will make it work like the alternative, to avoid a special case 283 in the final iteration. 284 285 Either way, each step is a 3-by-2-wide-digit division approximated first by 286 a 2-by-1-wide-digit division, just as we did for regular digits in long division. 287 Because the actual answer we want is a 3-by-2-wide-digit division, instead of 288 multiplying q̂·v directly during the fixup, we can use the quick refinement 289 from long division (an n/2-by-n/2 multiply) to correct q to its actual value 290 and also compute the remainder (as mentioned above), and then stop after that, 291 never doing a full n-by-n multiply. 292 293 Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add 294 (not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that 295 produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee 296 to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first 297 digit. These benefits are discussed more after the Good Guess Guarantee proof 298 below. 299 300 301 How Fast is Recursive Division? 302 303 For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over 304 wide digits, producing two wide digits plus a possible leading regular digit 1, 305 which can be handled without a recursive call. That is, the algorithm uses two 306 full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit 307 multiplication, along with a few n-digit additions and subtractions. The standard 308 n-by-n-digit multiplication algorithm requires O(n²) time, making the overall 309 algorithm require time T(n) where 310 311 T(n) = 2T(n/2) + O(n) + O(n²) 312 313 which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²). 314 This is not an improvement over regular long division. 315 316 When the number of digits n becomes large enough, Karatsuba's algorithm for 317 multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time. 318 (Karatsuba multiplication is implemented in func karatsuba in nat.go.) 319 That makes the overall recursive division algorithm take O(n^1.6) time as well, 320 which is an improvement, but again only for large enough numbers. 321 322 It is not critical to make sure that every recursion does only two recursive 323 calls. While in general the number of recursive calls can change the time 324 analysis, in this case doing three calls does not change the analysis: 325 326 T(n) = 3T(n/2) + O(n) + O(n^log₂3) 327 328 ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking 329 time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the 330 division does not hurt the asymptotic performance. Of course, it is likely 331 still faster in practice to do two. 332 333 334 Proof of the Good Guess Guarantee 335 336 Given numbers x, y, let us break them into the quotients and remainders when 337 divided by some scaling factor S, with the added constraints that the quotient 338 x/y and the high part of y are both less than some limit T, and that the high 339 part of y is at least half as big as T. 340 341 x₁ = ⌊x/S⌋ y₁ = ⌊y/S⌋ 342 x₀ = x mod S y₀ = y mod S 343 344 x = x₁·S + x₀ 0 ≤ x₀ < S x/y < T 345 y = y₁·S + y₀ 0 ≤ y₀ < S T/2 ≤ y₁ < T 346 347 And consider the two truncated quotients: 348 349 q = ⌊x/y⌋ 350 q̂ = ⌊x₁/y₁⌋ 351 352 We will prove that q ≤ q̂ ≤ q+2. 353 354 The guarantee makes no real demands on the scaling factor S: it is simply the 355 magnitude of the digits cut from both x and y to produce x₁ and y₁. 356 The guarantee makes only limited demands on T: it must be large enough to hold 357 the quotient x/y, and y₁ must have roughly the same size. 358 359 To apply to the earlier discussion of 2-by-1 guesses in long division, 360 we would choose: 361 362 S = Bⁿ⁻¹ 363 T = B 364 x = u 365 x₁ = uₙuₙ₋₁ 366 x₀ = uₙ₋₂...u₀ 367 y = v 368 y₁ = vₙ₋₁ 369 y₀ = vₙ₋₂...u₀ 370 371 These simpler variables avoid repeating those longer expressions in the proof. 372 373 Note also that, by definition, truncating division ⌊x/y⌋ satisfies 374 375 x/y - 1 < ⌊x/y⌋ ≤ x/y. 376 377 This fact will be used a few times in the proofs. 378 379 Proof that q ≤ q̂: 380 381 q̂·y₁ = ⌊x₁/y₁⌋·y₁ [by definition, q̂ = ⌊x₁/y₁⌋] 382 > (x₁/y₁ - 1)·y₁ [x₁/y₁ - 1 < ⌊x₁/y₁⌋] 383 = x₁ - y₁ [distribute y₁] 384 385 So q̂·y₁ > x₁ - y₁. 386 Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1. 387 388 q̂ - q = q̂ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] 389 ≥ q̂ - x/y [⌊x/y⌋ < x/y] 390 = (1/y)·(q̂·y - x) [factor out 1/y] 391 ≥ (1/y)·(q̂·y₁·S - x) [y = y₁·S + y₀ ≥ y₁·S] 392 ≥ (1/y)·((x₁ - y₁ + 1)·S - x) [above: q̂·y₁ ≥ x₁ - y₁ + 1] 393 = (1/y)·(x₁·S - y₁·S + S - x) [distribute S] 394 = (1/y)·(S - x₀ - y₁·S) [-x = -x₁·S - x₀] 395 > -y₁·S / y [x₀ < S, so S - x₀ < 0; drop it] 396 ≥ -1 [y₁·S ≤ y] 397 398 So q̂ - q > -1. 399 Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂. 400 401 Proof that q̂ ≤ q+2: 402 403 x₁/y₁ - x/y = x₁·S/y₁·S - x/y [multiply left term by S/S] 404 ≤ x/y₁·S - x/y [x₁S ≤ x] 405 = (x/y)·(y/y₁·S - 1) [factor out x/y] 406 = (x/y)·((y - y₁·S)/y₁·S) [move -1 into y/y₁·S fraction] 407 = (x/y)·(y₀/y₁·S) [y - y₁·S = y₀] 408 = (x/y)·(1/y₁)·(y₀/S) [factor out 1/y₁] 409 < (x/y)·(1/y₁) [y₀ < S, so y₀/S < 1] 410 ≤ (x/y)·(2/T) [y₁ ≥ T/2, so 1/y₁ ≤ 2/T] 411 < T·(2/T) [x/y < T] 412 = 2 [T·(2/T) = 2] 413 414 So x₁/y₁ - x/y < 2. 415 416 q̂ - q = ⌊x₁/y₁⌋ - q [by definition, q̂ = ⌊x₁/y₁⌋] 417 = ⌊x₁/y₁⌋ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] 418 ≤ x₁/y₁ - ⌊x/y⌋ [⌊x₁/y₁⌋ ≤ x₁/y₁] 419 < x₁/y₁ - (x/y - 1) [⌊x/y⌋ > x/y - 1] 420 = (x₁/y₁ - x/y) + 1 [regrouping] 421 < 2 + 1 [above: x₁/y₁ - x/y < 2] 422 = 3 423 424 So q̂ - q < 3. 425 Since q̂ - q is an integer, q̂ - q ≤ 2. 426 427 Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore 428 q̂ - q ≤ 1. 429 430 Note also that in the general case 2n-by-n division where we don't know that 431 x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could 432 remove the special case first step of long division as long as we allow the 433 first fixup loop to run up to four times. (Using a simple comparison to decide 434 whether the first digit is 0 or 1 is still more efficient, though.) 435 436 Finally, note that when dividing three leading base-B digits by two (scaled), 437 we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T. 438 This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that 439 ⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an 440 integer. For random x and y, the chance of this is 2/B, or, for large B, 441 approximately zero. This means that after we produce the 3-by-2 guess in the 442 long division algorithm, the fixup loop essentially never runs. 443 444 In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit 445 division has exactly the same effect: the probability of needing a fixup is the 446 same 2/B. Even better, we can allow the general case x/y < 2T and the fixup 447 probability only grows to 4/B, still essentially zero. 448 449 450 References 451 452 There are no great references for implementing long division; thus this comment. 453 Here are some notes about what to expect from the obvious references. 454 455 Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical 456 reference for long division, but that entire series is highly compressed, never 457 repeating a necessary fact and leaving important insights to the exercises. 458 For example, no rationale whatsoever is given for the calculation that extends 459 q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound. 460 The proof that the calculation even has the desired effect is left to exercises. 461 The solutions to those exercises provided at the back of the book are entirely 462 calculations, still with no explanation as to what is going on or how you would 463 arrive at the idea of doing those exact calculations. Nowhere is it mentioned 464 that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the 465 Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction, 466 making it difficult to understand how modifications like adding another digit 467 or adjusting the quotient range affects the overall bound. 468 469 All that said, Knuth remains the canonical reference. It is dense but packed 470 full of information and references, and the proofs are simpler than many other 471 presentations. The proofs above are reworkings of Knuth's to remove the 472 arguments by contradiction and add explanations or steps that Knuth omitted. 473 But beware of errors in older printings. Take the published errata with you. 474 475 Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield” 476 starts with a blunt critique of Knuth's presentation (among others) and then 477 presents a more detailed and easier to follow treatment of long division, 478 including an implementation in Pascal. But the algorithm and implementation 479 work entirely in terms of 3-by-2 division, which is much less useful on modern 480 hardware than an algorithm using 2-by-1 division. The proofs are a bit too 481 focused on digit counting and seem needlessly complex, especially compared to 482 the ones given above. 483 484 Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of 485 implementing division by an n-digit divisor using recursive calls to division 486 by an n/2-digit divisor, relying on Karatsuba multiplication to yield a 487 sub-quadratic run time. However, the presentation decisions are made almost 488 entirely for the purpose of simplifying the run-time analysis, rather than 489 simplifying the presentation. Instead of a single algorithm that loops over 490 quotient digits, the paper presents two mutually-recursive algorithms, for 491 2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n 492 algorithm. 493 494 The proofs in the paper are remarkably complex, especially considering that 495 the algorithm is at its core just long division on wide digits, so that the 496 usual long division proofs apply essentially unaltered. 497 */ 498 499 package big 500 501 import "math/bits" 502 503 // div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v. 504 // It uses z and z2 as the storage for q and r. 505 func (z nat) div(z2, u, v nat) (q, r nat) { 506 if len(v) == 0 { 507 panic("division by zero") 508 } 509 510 if u.cmp(v) < 0 { 511 q = z[:0] 512 r = z2.set(u) 513 return 514 } 515 516 if len(v) == 1 { 517 // Short division: long optimized for a single-word divisor. 518 // In that case, the 2-by-1 guess is all we need at each step. 519 var r2 Word 520 q, r2 = z.divW(u, v[0]) 521 r = z2.setWord(r2) 522 return 523 } 524 525 q, r = z.divLarge(z2, u, v) 526 return 527 } 528 529 // divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y. 530 // It uses z as the storage for q. 531 // Note that y is a single digit (Word), not a big number. 532 func (z nat) divW(x nat, y Word) (q nat, r Word) { 533 m := len(x) 534 switch { 535 case y == 0: 536 panic("division by zero") 537 case y == 1: 538 q = z.set(x) // result is x 539 return 540 case m == 0: 541 q = z[:0] // result is 0 542 return 543 } 544 // m > 0 545 z = z.make(m) 546 r = divWVW(z, 0, x, y) 547 q = z.norm() 548 return 549 } 550 551 // modW returns x % d. 552 func (x nat) modW(d Word) (r Word) { 553 // TODO(agl): we don't actually need to store the q value. 554 var q nat 555 q = q.make(len(x)) 556 return divWVW(q, 0, x, d) 557 } 558 559 // divWVW overwrites z with ⌊x/y⌋, returning the remainder r. 560 // The caller must ensure that len(z) = len(x). 561 func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) { 562 r = xn 563 if len(x) == 1 { 564 qq, rr := bits.Div(uint(r), uint(x[0]), uint(y)) 565 z[0] = Word(qq) 566 return Word(rr) 567 } 568 rec := reciprocalWord(y) 569 for i := len(z) - 1; i >= 0; i-- { 570 z[i], r = divWW(r, x[i], y, rec) 571 } 572 return r 573 } 574 575 // div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn. 576 // It uses z and u as the storage for q and r. 577 // The caller must ensure that len(vIn) ≥ 2 (use divW otherwise) 578 // and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise). 579 func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) { 580 n := len(vIn) 581 m := len(uIn) - n 582 583 // Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above). 584 // vIn is treated as a read-only input (it may be in use by another 585 // goroutine), so we must make a copy. 586 // uIn is copied to u. 587 shift := nlz(vIn[n-1]) 588 vp := getNat(n) 589 v := *vp 590 shlVU(v, vIn, shift) 591 u = u.make(len(uIn) + 1) 592 u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift) 593 594 // The caller should not pass aliased z and u, since those are 595 // the two different outputs, but correct just in case. 596 if alias(z, u) { 597 z = nil 598 } 599 q = z.make(m + 1) 600 601 // Use basic or recursive long division depending on size. 602 if n < divRecursiveThreshold { 603 q.divBasic(u, v) 604 } else { 605 q.divRecursive(u, v) 606 } 607 putNat(vp) 608 609 q = q.norm() 610 611 // Undo scaling of remainder. 612 shrVU(u, u, shift) 613 r = u.norm() 614 615 return q, r 616 } 617 618 // divBasic implements long division as described above. 619 // It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r. 620 // q must be large enough to hold ⌊u/v⌋. 621 func (q nat) divBasic(u, v nat) { 622 n := len(v) 623 m := len(u) - n 624 625 qhatvp := getNat(n + 1) 626 qhatv := *qhatvp 627 628 // Set up for divWW below, precomputing reciprocal argument. 629 vn1 := v[n-1] 630 rec := reciprocalWord(vn1) 631 632 // Compute each digit of quotient. 633 for j := m; j >= 0; j-- { 634 // Compute the 2-by-1 guess q̂. 635 // The first iteration must invent a leading 0 for u. 636 qhat := Word(_M) 637 var ujn Word 638 if j+n < len(u) { 639 ujn = u[j+n] 640 } 641 642 // ujn ≤ vn1, or else q̂ would be more than one digit. 643 // For ujn == vn1, we set q̂ to the max digit M above. 644 // Otherwise, we compute the 2-by-1 guess. 645 if ujn != vn1 { 646 var rhat Word 647 qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec) 648 649 // Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above. 650 vn2 := v[n-2] 651 x1, x2 := mulWW(qhat, vn2) 652 ujn2 := u[j+n-2] 653 for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2] 654 qhat-- 655 prevRhat := rhat 656 rhat += vn1 657 // If r̂ overflows, then 658 // r̂ u[j+n-2]v[n-1] is now definitely > x1 x2. 659 if rhat < prevRhat { 660 break 661 } 662 // TODO(rsc): No need for a full mulWW. 663 // x2 += vn2; if x2 overflows, x1++ 664 x1, x2 = mulWW(qhat, vn2) 665 } 666 } 667 668 // Compute q̂·v. 669 qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0) 670 qhl := len(qhatv) 671 if j+qhl > len(u) && qhatv[n] == 0 { 672 qhl-- 673 } 674 675 // Subtract q̂·v from the current section of u. 676 // If it underflows, q̂·v > u, which we fix up 677 // by decrementing q̂ and adding v back. 678 c := subVV(u[j:j+qhl], u[j:], qhatv) 679 if c != 0 { 680 c := addVV(u[j:j+n], u[j:], v) 681 // If n == qhl, the carry from subVV and the carry from addVV 682 // cancel out and don't affect u[j+n]. 683 if n < qhl { 684 u[j+n] += c 685 } 686 qhat-- 687 } 688 689 // Save quotient digit. 690 // Caller may know the top digit is zero and not leave room for it. 691 if j == m && m == len(q) && qhat == 0 { 692 continue 693 } 694 q[j] = qhat 695 } 696 697 putNat(qhatvp) 698 } 699 700 // greaterThan reports whether the two digit numbers x1 x2 > y1 y2. 701 // TODO(rsc): In contradiction to most of this file, x1 is the high 702 // digit and x2 is the low digit. This should be fixed. 703 func greaterThan(x1, x2, y1, y2 Word) bool { 704 return x1 > y1 || x1 == y1 && x2 > y2 705 } 706 707 // divRecursiveThreshold is the number of divisor digits 708 // at which point divRecursive is faster than divBasic. 709 const divRecursiveThreshold = 100 710 711 // divRecursive implements recursive division as described above. 712 // It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r. 713 // z must be large enough to hold ⌊u/v⌋. 714 // This function is just for allocating and freeing temporaries 715 // around divRecursiveStep, the real implementation. 716 func (z nat) divRecursive(u, v nat) { 717 // Recursion depth is (much) less than 2 log₂(len(v)). 718 // Allocate a slice of temporaries to be reused across recursion, 719 // plus one extra temporary not live across the recursion. 720 recDepth := 2 * bits.Len(uint(len(v))) 721 tmp := getNat(3 * len(v)) 722 temps := make([]*nat, recDepth) 723 724 z.clear() 725 z.divRecursiveStep(u, v, 0, tmp, temps) 726 727 // Free temporaries. 728 for _, n := range temps { 729 if n != nil { 730 putNat(n) 731 } 732 } 733 putNat(tmp) 734 } 735 736 // divRecursiveStep is the actual implementation of recursive division. 737 // It adds ⌊u/v⌋ to z and overwrites u with the remainder r. 738 // z must be large enough to hold ⌊u/v⌋. 739 // It uses temps[depth] (allocating if needed) as a temporary live across 740 // the recursive call. It also uses tmp, but not live across the recursion. 741 func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) { 742 // u is a subsection of the original and may have leading zeros. 743 // TODO(rsc): The v = v.norm() is useless and should be removed. 744 // We know (and require) that v's top digit is ≥ B/2. 745 u = u.norm() 746 v = v.norm() 747 if len(u) == 0 { 748 z.clear() 749 return 750 } 751 752 // Fall back to basic division if the problem is now small enough. 753 n := len(v) 754 if n < divRecursiveThreshold { 755 z.divBasic(u, v) 756 return 757 } 758 759 // Nothing to do if u is shorter than v (implies u < v). 760 m := len(u) - n 761 if m < 0 { 762 return 763 } 764 765 // We consider B digits in a row as a single wide digit. 766 // (See “Recursive Division” above.) 767 // 768 // TODO(rsc): rename B to Wide, to avoid confusion with _B, 769 // which is something entirely different. 770 // TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋. 771 B := n / 2 772 773 // Allocate a nat for qhat below. 774 if temps[depth] == nil { 775 temps[depth] = getNat(n) // TODO(rsc): Can be just B+1. 776 } else { 777 *temps[depth] = temps[depth].make(B + 1) 778 } 779 780 // Compute each wide digit of the quotient. 781 // 782 // TODO(rsc): Change the loop to be 783 // for j := (m+B-1)/B*B; j > 0; j -= B { 784 // which will make the final step a regular step, letting us 785 // delete what amounts to an extra copy of the loop body below. 786 j := m 787 for j > B { 788 // Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits). 789 // First make the 2-by-1-wide-digit guess using a recursive call. 790 // Then extend the guess to the full 3-by-2 (see “Refining Guesses”). 791 // 792 // For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit, 793 // we use a (2B+1)-by-(B+1) digit, which handles the possibility that 794 // the result has an extra leading 1 digit as well as guaranteeing 795 // that the computed q̂ will be off by at most 1 instead of 2. 796 797 // s is the number of digits to drop from the 3B- and 2B-digit chunks. 798 // We drop B-1 to be left with 2B+1 and B+1. 799 s := (B - 1) 800 801 // uu is the up-to-3B-digit section of u we are working on. 802 uu := u[j-B:] 803 804 // Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n]. 805 qhat := *temps[depth] 806 qhat.clear() 807 qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps) 808 qhat = qhat.norm() 809 810 // Extend to a 3-by-2 quotient and remainder. 811 // Because divRecursiveStep overwrote the top part of uu with 812 // the remainder r̂, the full uu already contains the equivalent 813 // of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion. 814 // Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder. 815 // If that subtraction underflows, q̂·v > u, which we fix up 816 // by decrementing q̂ and adding v back, same as in long division. 817 818 // TODO(rsc): Instead of subtract and fix-up, this code is computing 819 // q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u. 820 // But we can do the subtraction directly, as in the comment above 821 // and in long division, because we know that q̂ is wrong by at most one. 822 qhatv := tmp.make(3 * n) 823 qhatv.clear() 824 qhatv = qhatv.mul(qhat, v[:s]) 825 for i := 0; i < 2; i++ { 826 e := qhatv.cmp(uu.norm()) 827 if e <= 0 { 828 break 829 } 830 subVW(qhat, qhat, 1) 831 c := subVV(qhatv[:s], qhatv[:s], v[:s]) 832 if len(qhatv) > s { 833 subVW(qhatv[s:], qhatv[s:], c) 834 } 835 addAt(uu[s:], v[s:], 0) 836 } 837 if qhatv.cmp(uu.norm()) > 0 { 838 panic("impossible") 839 } 840 c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv) 841 if c > 0 { 842 subVW(uu[len(qhatv):], uu[len(qhatv):], c) 843 } 844 addAt(z, qhat, j-B) 845 j -= B 846 } 847 848 // TODO(rsc): Rewrite loop as described above and delete all this code. 849 850 // Now u < (v<<B), compute lower bits in the same way. 851 // Choose shift = B-1 again. 852 s := B - 1 853 qhat := *temps[depth] 854 qhat.clear() 855 qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps) 856 qhat = qhat.norm() 857 qhatv := tmp.make(3 * n) 858 qhatv.clear() 859 qhatv = qhatv.mul(qhat, v[:s]) 860 // Set the correct remainder as before. 861 for i := 0; i < 2; i++ { 862 if e := qhatv.cmp(u.norm()); e > 0 { 863 subVW(qhat, qhat, 1) 864 c := subVV(qhatv[:s], qhatv[:s], v[:s]) 865 if len(qhatv) > s { 866 subVW(qhatv[s:], qhatv[s:], c) 867 } 868 addAt(u[s:], v[s:], 0) 869 } 870 } 871 if qhatv.cmp(u.norm()) > 0 { 872 panic("impossible") 873 } 874 c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv) 875 if c > 0 { 876 c = subVW(u[len(qhatv):], u[len(qhatv):], c) 877 } 878 if c > 0 { 879 panic("impossible") 880 } 881 882 // Done! 883 addAt(z, qhat.norm(), 0) 884 } 885