...

Source file src/math/big/natdiv.go

Documentation: math/big

		 1  // Copyright 2009 The Go Authors. All rights reserved.
		 2  // Use of this source code is governed by a BSD-style
		 3  // license that can be found in the LICENSE file.
		 4  
		 5  /*
		 6  
		 7  Multi-precision division. Here be dragons.
		 8  
		 9  Given u and v, where u is n+m digits, and v is n digits (with no leading zeros),
		10  the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v.
		11  That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x,
		12  and rem = u - quo·v.
		13  
		14  
		15  Long Division
		16  
		17  Division in a computer proceeds the same as long division in elementary school,
		18  but computers are not as good as schoolchildren at following vague directions,
		19  so we have to be much more precise about the actual steps and what can happen.
		20  
		21  We work from most to least significant digit of the quotient, doing:
		22  
		23   • Guess a digit q, the number of v to subtract from the current
		24  	 section of u to zero out the topmost digit.
		25   • Check the guess by multiplying q·v and comparing it against
		26  	 the current section of u, adjusting the guess as needed.
		27   • Subtract q·v from the current section of u.
		28   • Add q to the corresponding section of the result quo.
		29  
		30  When all digits have been processed, the final remainder is left in u
		31  and returned as rem.
		32  
		33  For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2).
		34  
		35  									 q₂ q₁ q₀
		36  					 _________________
		37  	v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀
		38  						 ↓	↓	↓	|	|
		39  						[u₄ u₃ u₂]|	|
		40  					- [	q₂·v	]|	|
		41  					----------- ↓	|
		42  						[	rem	| u₁]|
		43  					- [		q₁·v	 ]|
		44  						 ----------- ↓
		45  							 [	rem	| u₀]
		46  						 - [		q₀·v	 ]
		47  								------------
		48  									[	rem	 ]
		49  
		50  Instead of creating new storage for the remainders and copying digits from u
		51  as indicated by the arrows, we use u's storage directly as both the source
		52  and destination of the subtractions, so that the remainders overwrite
		53  successive overlapping sections of u as the division proceeds, using a slice
		54  of u to identify the current section. This avoids all the copying as well as
		55  shifting of remainders.
		56  
		57  Division of u with n+m digits by v with n digits (in base B) can in general
		58  produce at most m+1 digits, because:
		59  
		60  	• u < B^(n+m)							 [B^(n+m) has n+m+1 digits]
		61  	• v ≥ B^(n-1)							 [B^(n-1) is the smallest n-digit number]
		62  	• u/v < B^(n+m) / B^(n-1)	 [divide bounds for u, v]
		63  	• u/v < B^(m+1)						 [simplify]
		64  
		65  The first step is special: it takes the top n digits of u and divides them by
		66  the n digits of v, producing the first quotient digit and an n-digit remainder.
		67  In the example, q₂ = ⌊u₄u₃u₂ / v⌋.
		68  
		69  The first step divides n digits by n digits to ensure that it produces only a
		70  single digit.
		71  
		72  Each subsequent step appends the next digit from u to the remainder and divides
		73  those n+1 digits by the n digits of v, producing another quotient digit and a
		74  new n-digit remainder.
		75  
		76  Subsequent steps divide n+1 digits by n digits, an operation that in general
		77  might produce two digits. However, as used in the algorithm, that division is
		78  guaranteed to produce only a single digit. The dividend is of the form
		79  rem·B + d, where rem is a remainder from the previous step and d is a single
		80  digit, so:
		81  
		82   • rem ≤ v - 1								 [rem is a remainder from dividing by v]
		83   • rem·B ≤ v·B - B						 [multiply by B]
		84   • d ≤ B - 1									 [d is a single digit]
		85   • rem·B + d ≤ v·B - 1				 [add]
		86   • rem·B + d < v·B						 [change ≤ to <]
		87   • (rem·B + d)/v < B					 [divide by v]
		88  
		89  
		90  Guess and Check
		91  
		92  At each step we need to divide n+1 digits by n digits, but this is for the
		93  implementation of division by n digits, so we can't just invoke a division
		94  routine: we _are_ the division routine. Instead, we guess at the answer and
		95  then check it using multiplication. If the guess is wrong, we correct it.
		96  
		97  How can this guessing possibly be efficient? It turns out that the following
		98  statement (let's call it the Good Guess Guarantee) is true.
		99  
	 100  If
	 101  
	 102   • q = ⌊u/v⌋ where u is n+1 digits and v is n digits,
	 103   • q < B, and
	 104   • the topmost digit of v = vₙ₋₁ ≥ B/2,
	 105  
	 106  then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.)
	 107  
	 108  That is, if we know the answer has only a single digit and we guess an answer
	 109  by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division,
	 110  then that guess is at least as large as the correct answer. It is also not
	 111  too much larger: it is off by at most two from the correct answer.
	 112  
	 113  Note that in the first step of the overall division, which is an n-by-n-digit
	 114  division, the 2-by-1 guess uses an implicit uₙ = 0.
	 115  
	 116  Note that using a 2-by-1-digit division here does not mean calling ourselves
	 117  recursively. Instead, we use an efficient direct hardware implementation of
	 118  that operation.
	 119  
	 120  Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v.
	 121  If a guess q̂ is too big, it will not satisfy this test. Viewed a different way,
	 122  the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is
	 123  negative, then the guess q̂ is too big.
	 124  
	 125  This gives us a way to compute q. First compute q̂ with 2-by-1-digit division.
	 126  Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because
	 127  q̂ ≤ q+2.
	 128  
	 129  
	 130  Scaling Inputs
	 131  
	 132  The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2.
	 133  For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off
	 134  because the first digit 1 is smaller than B/2 = 5.
	 135  
	 136  We can ensure that v has a large top digit by multiplying both u and v by the
	 137  right amount. Continuing the example, if we multiply both 172 and 19 by 3, we
	 138  now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough
	 139  ⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here
	 140  to multiply by 4, because that can be done with a shift. Specifically, we can
	 141  always count the number of leading zeros i in the first digit of v and then
	 142  shift both u and v left by i bits.
	 143  
	 144  Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will
	 145  be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder
	 146  by the scaling factor (shifting right i bits) when we finish.
	 147  
	 148  Note that these shifts happen before and after the entire division algorithm,
	 149  not at each step in the per-digit iteration.
	 150  
	 151  Note the effect of scaling inputs on the size of the possible quotient.
	 152  In the scaled u/v, u can gain a digit from scaling; v never does, because we
	 153  pick the scaling factor to make v's top digit larger but without overflowing.
	 154  If u and v have n+m and n digits after scaling, then:
	 155  
	 156  	• u < B^(n+m)							 [B^(n+m) has n+m+1 digits]
	 157  	• v ≥ B^n / 2							 [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2]
	 158  	• u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v]
	 159  	• u/v < 2 B^m							 [simplify]
	 160  
	 161  The quotient can still have m+1 significant digits, but if so the top digit
	 162  must be a 1. This provides a different way to handle the first digit of the
	 163  result: compare the top n digits of u against v and fill in either a 0 or a 1.
	 164  
	 165  
	 166  Refining Guesses
	 167  
	 168  Before we check whether u < q̂·v, we can adjust our guess to change it from
	 169  q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋.
	 170  Although not mentioned above, the Good Guess Guarantee also promises that this
	 171  3-by-2-digit division guess is more precise and at most one away from the real
	 172  answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done
	 173  without n-digit math.
	 174  
	 175  If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to
	 176  ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division:
	 177  if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced.
	 178  
	 179  Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0,
	 180  and
	 181  
	 182  	uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂)
	 183  														[splitting off the bottom digit]
	 184  												= (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂
	 185  														[regrouping]
	 186  
	 187  The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁.
	 188  If the initial guess returns both q̂ and its remainder r̂, then checking
	 189  whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂.
	 190  
	 191  If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by
	 192  decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂.
	 193  (As before, this fixup is only needed at most twice.)
	 194  
	 195  Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one
	 196  away from the correct q, and we've avoided doing any n-digit math.
	 197  (If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.)
	 198  
	 199  The final check u < q̂·v and the possible fixup must be done at full precision.
	 200  For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess
	 201  is not often wrong at all. But still we must do the check. Note that since the
	 202  3-by-2 guess is off by at most 1, it can be convenient to perform the final
	 203  u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the
	 204  subtraction underflows, decremeting q̂ and adding one v back to r is enough to
	 205  arrive at the final q, r.
	 206  
	 207  That's the entirety of long division: scale the inputs, and then loop over
	 208  each output position, guessing, checking, and correcting the next output digit.
	 209  
	 210  For a 2n-digit number divided by an n-digit number (the worst size-n case for
	 211  division complexity), this algorithm uses n+1 iterations, each of which must do
	 212  at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of
	 213  O(n) time each, so O(n²) time overall.
	 214  
	 215  
	 216  Recursive Division
	 217  
	 218  For very large inputs, it is possible to improve on the O(n²) algorithm.
	 219  Let's call a group of n/2 real digits a (very) “wide digit”. We can run the
	 220  standard long division algorithm explained above over the wide digits instead of
	 221  the actual digits. This will result in many fewer steps, but the math involved in
	 222  each step is more work.
	 223  
	 224  Where basic long division uses a 2-by-1-digit division to guess the initial q̂,
	 225  the new algorithm must use a 2-by-1-wide-digit division, which is of course
	 226  really an n-by-n/2-digit division. That's OK: if we implement n-digit division
	 227  in terms of n/2-digit division, the recursion will terminate when the divisor
	 228  becomes small enough to handle with standard long division or even with the
	 229  2-by-1 hardware instruction.
	 230  
	 231  For example, here is a sketch of dividing 10 digits by 4, proceeding with
	 232  wide digits corresponding to two regular digits. The first step, still special,
	 233  must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit
	 234  remainder less than v. The middle steps divide 6 digits by 4, guaranteed to
	 235  produce two output digits each (one wide digit) with 4-digit remainders.
	 236  The final step must use what it has: the 4-digit remainder plus one more,
	 237  5 digits to divide by 4.
	 238  
	 239  												 q₆ q₅ q₄ q₃ q₂ q₁ q₀
	 240  							_______________________________
	 241  	v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
	 242  								↓	↓	↓	↓	↓	|	|	|	|	|
	 243  							 [u₉ u₈ u₇ u₆ u₅]|	|	|	|	|
	 244  						 - [		q₆q₅·v		]|	|	|	|	|
	 245  						 ----------------- ↓	↓	|	|	|
	 246  									[		rem		|u₄ u₃]|	|	|
	 247  								- [		 q₄q₃·v			]|	|	|
	 248  								-------------------- ↓	↓	|
	 249  												[		rem		|u₂ u₁]|
	 250  											- [		 q₂q₁·v			]|
	 251  											-------------------- ↓
	 252  															[		rem		|u₀]
	 253  														- [		 q₀·v		 ]
	 254  														------------------
	 255  																 [		rem		]
	 256  
	 257  An alternative would be to look ahead to how well n/2 divides into n+m and
	 258  adjust the first step to use fewer digits as needed, making the first step
	 259  more special to make the last step not special at all. For example, using the
	 260  same input, we could choose to use only 4 digits in the first step, leaving
	 261  a full wide digit for the last step:
	 262  
	 263  												 q₆ q₅ q₄ q₃ q₂ q₁ q₀
	 264  							_______________________________
	 265  	v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
	 266  								↓	↓	↓	↓	|	|	|	|	|	|
	 267  							 [u₉ u₈ u₇ u₆]|	|	|	|	|	|
	 268  						 - [		q₆·v	 ]|	|	|	|	|	|
	 269  						 -------------- ↓	↓	|	|	|	|
	 270  							 [		rem		|u₅ u₄]|	|	|	|
	 271  						 - [		 q₅q₄·v			]|	|	|	|
	 272  						 -------------------- ↓	↓	|	|
	 273  										 [		rem		|u₃ u₂]|	|
	 274  									 - [		 q₃q₂·v			]|	|
	 275  									 -------------------- ↓	↓
	 276  													 [		rem		|u₁ u₀]
	 277  												 - [		 q₁q₀·v			]
	 278  												 ---------------------
	 279  																 [		rem		]
	 280  
	 281  Today, the code in divRecursiveStep works like the first example. Perhaps in
	 282  the future we will make it work like the alternative, to avoid a special case
	 283  in the final iteration.
	 284  
	 285  Either way, each step is a 3-by-2-wide-digit division approximated first by
	 286  a 2-by-1-wide-digit division, just as we did for regular digits in long division.
	 287  Because the actual answer we want is a 3-by-2-wide-digit division, instead of
	 288  multiplying q̂·v directly during the fixup, we can use the quick refinement
	 289  from long division (an n/2-by-n/2 multiply) to correct q to its actual value
	 290  and also compute the remainder (as mentioned above), and then stop after that,
	 291  never doing a full n-by-n multiply.
	 292  
	 293  Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add
	 294  (not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that
	 295  produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee
	 296  to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first
	 297  digit. These benefits are discussed more after the Good Guess Guarantee proof
	 298  below.
	 299  
	 300  
	 301  How Fast is Recursive Division?
	 302  
	 303  For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over
	 304  wide digits, producing two wide digits plus a possible leading regular digit 1,
	 305  which can be handled without a recursive call. That is, the algorithm uses two
	 306  full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit
	 307  multiplication, along with a few n-digit additions and subtractions. The standard
	 308  n-by-n-digit multiplication algorithm requires O(n²) time, making the overall
	 309  algorithm require time T(n) where
	 310  
	 311  	T(n) = 2T(n/2) + O(n) + O(n²)
	 312  
	 313  which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²).
	 314  This is not an improvement over regular long division.
	 315  
	 316  When the number of digits n becomes large enough, Karatsuba's algorithm for
	 317  multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time.
	 318  (Karatsuba multiplication is implemented in func karatsuba in nat.go.)
	 319  That makes the overall recursive division algorithm take O(n^1.6) time as well,
	 320  which is an improvement, but again only for large enough numbers.
	 321  
	 322  It is not critical to make sure that every recursion does only two recursive
	 323  calls. While in general the number of recursive calls can change the time
	 324  analysis, in this case doing three calls does not change the analysis:
	 325  
	 326  	T(n) = 3T(n/2) + O(n) + O(n^log₂3)
	 327  
	 328  ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking
	 329  time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the
	 330  division does not hurt the asymptotic performance. Of course, it is likely
	 331  still faster in practice to do two.
	 332  
	 333  
	 334  Proof of the Good Guess Guarantee
	 335  
	 336  Given numbers x, y, let us break them into the quotients and remainders when
	 337  divided by some scaling factor S, with the added constraints that the quotient
	 338  x/y and the high part of y are both less than some limit T, and that the high
	 339  part of y is at least half as big as T.
	 340  
	 341  	x₁ = ⌊x/S⌋				y₁ = ⌊y/S⌋
	 342  	x₀ = x mod S			y₀ = y mod S
	 343  
	 344  	x	= x₁·S + x₀		0 ≤ x₀ < S		x/y < T
	 345  	y	= y₁·S + y₀		0 ≤ y₀ < S		T/2 ≤ y₁ < T
	 346  
	 347  And consider the two truncated quotients:
	 348  
	 349  	q = ⌊x/y⌋
	 350  	q̂ = ⌊x₁/y₁⌋
	 351  
	 352  We will prove that q ≤ q̂ ≤ q+2.
	 353  
	 354  The guarantee makes no real demands on the scaling factor S: it is simply the
	 355  magnitude of the digits cut from both x and y to produce x₁ and y₁.
	 356  The guarantee makes only limited demands on T: it must be large enough to hold
	 357  the quotient x/y, and y₁ must have roughly the same size.
	 358  
	 359  To apply to the earlier discussion of 2-by-1 guesses in long division,
	 360  we would choose:
	 361  
	 362  	S	= Bⁿ⁻¹
	 363  	T	= B
	 364  	x	= u
	 365  	x₁ = uₙuₙ₋₁
	 366  	x₀ = uₙ₋₂...u₀
	 367  	y	= v
	 368  	y₁ = vₙ₋₁
	 369  	y₀ = vₙ₋₂...u₀
	 370  
	 371  These simpler variables avoid repeating those longer expressions in the proof.
	 372  
	 373  Note also that, by definition, truncating division ⌊x/y⌋ satisfies
	 374  
	 375  	x/y - 1 < ⌊x/y⌋ ≤ x/y.
	 376  
	 377  This fact will be used a few times in the proofs.
	 378  
	 379  Proof that q ≤ q̂:
	 380  
	 381  	q̂·y₁ = ⌊x₁/y₁⌋·y₁											[by definition, q̂ = ⌊x₁/y₁⌋]
	 382  			 > (x₁/y₁ - 1)·y₁									[x₁/y₁ - 1 < ⌊x₁/y₁⌋]
	 383  			 = x₁ - y₁												 [distribute y₁]
	 384  
	 385  	So q̂·y₁ > x₁ - y₁.
	 386  	Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1.
	 387  
	 388  	q̂ - q = q̂ - ⌊x/y⌋											[by definition, q = ⌊x/y⌋]
	 389  				≥ q̂ - x/y												[⌊x/y⌋ < x/y]
	 390  				= (1/y)·(q̂·y - x)								[factor out 1/y]
	 391  				≥ (1/y)·(q̂·y₁·S - x)						 [y = y₁·S + y₀ ≥ y₁·S]
	 392  				≥ (1/y)·((x₁ - y₁ + 1)·S - x)		[above: q̂·y₁ ≥ x₁ - y₁ + 1]
	 393  				= (1/y)·(x₁·S - y₁·S + S - x)		[distribute S]
	 394  				= (1/y)·(S - x₀ - y₁·S)					[-x = -x₁·S - x₀]
	 395  				> -y₁·S / y											[x₀ < S, so S - x₀ < 0; drop it]
	 396  				≥ -1														 [y₁·S ≤ y]
	 397  
	 398  	So q̂ - q > -1.
	 399  	Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂.
	 400  
	 401  Proof that q̂ ≤ q+2:
	 402  
	 403  	x₁/y₁ - x/y = x₁·S/y₁·S - x/y					[multiply left term by S/S]
	 404  							≤ x/y₁·S - x/y						 [x₁S ≤ x]
	 405  							= (x/y)·(y/y₁·S - 1)			 [factor out x/y]
	 406  							= (x/y)·((y - y₁·S)/y₁·S)	[move -1 into y/y₁·S fraction]
	 407  							= (x/y)·(y₀/y₁·S)					[y - y₁·S = y₀]
	 408  							= (x/y)·(1/y₁)·(y₀/S)			[factor out 1/y₁]
	 409  							< (x/y)·(1/y₁)						 [y₀ < S, so y₀/S < 1]
	 410  							≤ (x/y)·(2/T)							[y₁ ≥ T/2, so 1/y₁ ≤ 2/T]
	 411  							< T·(2/T)									[x/y < T]
	 412  							= 2												[T·(2/T) = 2]
	 413  
	 414  	So x₁/y₁ - x/y < 2.
	 415  
	 416  	q̂ - q = ⌊x₁/y₁⌋ - q										[by definition, q̂ = ⌊x₁/y₁⌋]
	 417  				= ⌊x₁/y₁⌋ - ⌊x/y⌋								[by definition, q = ⌊x/y⌋]
	 418  				≤ x₁/y₁ - ⌊x/y⌋									[⌊x₁/y₁⌋ ≤ x₁/y₁]
	 419  				< x₁/y₁ - (x/y - 1)							[⌊x/y⌋ > x/y - 1]
	 420  				= (x₁/y₁ - x/y) + 1							[regrouping]
	 421  				< 2 + 1													[above: x₁/y₁ - x/y < 2]
	 422  				= 3
	 423  
	 424  	So q̂ - q < 3.
	 425  	Since q̂ - q is an integer, q̂ - q ≤ 2.
	 426  
	 427  Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore
	 428  q̂ - q ≤ 1.
	 429  
	 430  Note also that in the general case 2n-by-n division where we don't know that
	 431  x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could
	 432  remove the special case first step of long division as long as we allow the
	 433  first fixup loop to run up to four times. (Using a simple comparison to decide
	 434  whether the first digit is 0 or 1 is still more efficient, though.)
	 435  
	 436  Finally, note that when dividing three leading base-B digits by two (scaled),
	 437  we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T.
	 438  This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that
	 439  ⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an
	 440  integer. For random x and y, the chance of this is 2/B, or, for large B,
	 441  approximately zero. This means that after we produce the 3-by-2 guess in the
	 442  long division algorithm, the fixup loop essentially never runs.
	 443  
	 444  In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit
	 445  division has exactly the same effect: the probability of needing a fixup is the
	 446  same 2/B. Even better, we can allow the general case x/y < 2T and the fixup
	 447  probability only grows to 4/B, still essentially zero.
	 448  
	 449  
	 450  References
	 451  
	 452  There are no great references for implementing long division; thus this comment.
	 453  Here are some notes about what to expect from the obvious references.
	 454  
	 455  Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical
	 456  reference for long division, but that entire series is highly compressed, never
	 457  repeating a necessary fact and leaving important insights to the exercises.
	 458  For example, no rationale whatsoever is given for the calculation that extends
	 459  q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound.
	 460  The proof that the calculation even has the desired effect is left to exercises.
	 461  The solutions to those exercises provided at the back of the book are entirely
	 462  calculations, still with no explanation as to what is going on or how you would
	 463  arrive at the idea of doing those exact calculations. Nowhere is it mentioned
	 464  that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the
	 465  Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction,
	 466  making it difficult to understand how modifications like adding another digit
	 467  or adjusting the quotient range affects the overall bound.
	 468  
	 469  All that said, Knuth remains the canonical reference. It is dense but packed
	 470  full of information and references, and the proofs are simpler than many other
	 471  presentations. The proofs above are reworkings of Knuth's to remove the
	 472  arguments by contradiction and add explanations or steps that Knuth omitted.
	 473  But beware of errors in older printings. Take the published errata with you.
	 474  
	 475  Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield”
	 476  starts with a blunt critique of Knuth's presentation (among others) and then
	 477  presents a more detailed and easier to follow treatment of long division,
	 478  including an implementation in Pascal. But the algorithm and implementation
	 479  work entirely in terms of 3-by-2 division, which is much less useful on modern
	 480  hardware than an algorithm using 2-by-1 division. The proofs are a bit too
	 481  focused on digit counting and seem needlessly complex, especially compared to
	 482  the ones given above.
	 483  
	 484  Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of
	 485  implementing division by an n-digit divisor using recursive calls to division
	 486  by an n/2-digit divisor, relying on Karatsuba multiplication to yield a
	 487  sub-quadratic run time. However, the presentation decisions are made almost
	 488  entirely for the purpose of simplifying the run-time analysis, rather than
	 489  simplifying the presentation. Instead of a single algorithm that loops over
	 490  quotient digits, the paper presents two mutually-recursive algorithms, for
	 491  2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n
	 492  algorithm.
	 493  
	 494  The proofs in the paper are remarkably complex, especially considering that
	 495  the algorithm is at its core just long division on wide digits, so that the
	 496  usual long division proofs apply essentially unaltered.
	 497  */
	 498  
	 499  package big
	 500  
	 501  import "math/bits"
	 502  
	 503  // div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v.
	 504  // It uses z and z2 as the storage for q and r.
	 505  func (z nat) div(z2, u, v nat) (q, r nat) {
	 506  	if len(v) == 0 {
	 507  		panic("division by zero")
	 508  	}
	 509  
	 510  	if u.cmp(v) < 0 {
	 511  		q = z[:0]
	 512  		r = z2.set(u)
	 513  		return
	 514  	}
	 515  
	 516  	if len(v) == 1 {
	 517  		// Short division: long optimized for a single-word divisor.
	 518  		// In that case, the 2-by-1 guess is all we need at each step.
	 519  		var r2 Word
	 520  		q, r2 = z.divW(u, v[0])
	 521  		r = z2.setWord(r2)
	 522  		return
	 523  	}
	 524  
	 525  	q, r = z.divLarge(z2, u, v)
	 526  	return
	 527  }
	 528  
	 529  // divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y.
	 530  // It uses z as the storage for q.
	 531  // Note that y is a single digit (Word), not a big number.
	 532  func (z nat) divW(x nat, y Word) (q nat, r Word) {
	 533  	m := len(x)
	 534  	switch {
	 535  	case y == 0:
	 536  		panic("division by zero")
	 537  	case y == 1:
	 538  		q = z.set(x) // result is x
	 539  		return
	 540  	case m == 0:
	 541  		q = z[:0] // result is 0
	 542  		return
	 543  	}
	 544  	// m > 0
	 545  	z = z.make(m)
	 546  	r = divWVW(z, 0, x, y)
	 547  	q = z.norm()
	 548  	return
	 549  }
	 550  
	 551  // modW returns x % d.
	 552  func (x nat) modW(d Word) (r Word) {
	 553  	// TODO(agl): we don't actually need to store the q value.
	 554  	var q nat
	 555  	q = q.make(len(x))
	 556  	return divWVW(q, 0, x, d)
	 557  }
	 558  
	 559  // divWVW overwrites z with ⌊x/y⌋, returning the remainder r.
	 560  // The caller must ensure that len(z) = len(x).
	 561  func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) {
	 562  	r = xn
	 563  	if len(x) == 1 {
	 564  		qq, rr := bits.Div(uint(r), uint(x[0]), uint(y))
	 565  		z[0] = Word(qq)
	 566  		return Word(rr)
	 567  	}
	 568  	rec := reciprocalWord(y)
	 569  	for i := len(z) - 1; i >= 0; i-- {
	 570  		z[i], r = divWW(r, x[i], y, rec)
	 571  	}
	 572  	return r
	 573  }
	 574  
	 575  // div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn.
	 576  // It uses z and u as the storage for q and r.
	 577  // The caller must ensure that len(vIn) ≥ 2 (use divW otherwise)
	 578  // and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise).
	 579  func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) {
	 580  	n := len(vIn)
	 581  	m := len(uIn) - n
	 582  
	 583  	// Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above).
	 584  	// vIn is treated as a read-only input (it may be in use by another
	 585  	// goroutine), so we must make a copy.
	 586  	// uIn is copied to u.
	 587  	shift := nlz(vIn[n-1])
	 588  	vp := getNat(n)
	 589  	v := *vp
	 590  	shlVU(v, vIn, shift)
	 591  	u = u.make(len(uIn) + 1)
	 592  	u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift)
	 593  
	 594  	// The caller should not pass aliased z and u, since those are
	 595  	// the two different outputs, but correct just in case.
	 596  	if alias(z, u) {
	 597  		z = nil
	 598  	}
	 599  	q = z.make(m + 1)
	 600  
	 601  	// Use basic or recursive long division depending on size.
	 602  	if n < divRecursiveThreshold {
	 603  		q.divBasic(u, v)
	 604  	} else {
	 605  		q.divRecursive(u, v)
	 606  	}
	 607  	putNat(vp)
	 608  
	 609  	q = q.norm()
	 610  
	 611  	// Undo scaling of remainder.
	 612  	shrVU(u, u, shift)
	 613  	r = u.norm()
	 614  
	 615  	return q, r
	 616  }
	 617  
	 618  // divBasic implements long division as described above.
	 619  // It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r.
	 620  // q must be large enough to hold ⌊u/v⌋.
	 621  func (q nat) divBasic(u, v nat) {
	 622  	n := len(v)
	 623  	m := len(u) - n
	 624  
	 625  	qhatvp := getNat(n + 1)
	 626  	qhatv := *qhatvp
	 627  
	 628  	// Set up for divWW below, precomputing reciprocal argument.
	 629  	vn1 := v[n-1]
	 630  	rec := reciprocalWord(vn1)
	 631  
	 632  	// Compute each digit of quotient.
	 633  	for j := m; j >= 0; j-- {
	 634  		// Compute the 2-by-1 guess q̂.
	 635  		// The first iteration must invent a leading 0 for u.
	 636  		qhat := Word(_M)
	 637  		var ujn Word
	 638  		if j+n < len(u) {
	 639  			ujn = u[j+n]
	 640  		}
	 641  
	 642  		// ujn ≤ vn1, or else q̂ would be more than one digit.
	 643  		// For ujn == vn1, we set q̂ to the max digit M above.
	 644  		// Otherwise, we compute the 2-by-1 guess.
	 645  		if ujn != vn1 {
	 646  			var rhat Word
	 647  			qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec)
	 648  
	 649  			// Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above.
	 650  			vn2 := v[n-2]
	 651  			x1, x2 := mulWW(qhat, vn2)
	 652  			ujn2 := u[j+n-2]
	 653  			for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2]
	 654  				qhat--
	 655  				prevRhat := rhat
	 656  				rhat += vn1
	 657  				// If r̂	overflows, then
	 658  				// r̂ u[j+n-2]v[n-1] is now definitely > x1 x2.
	 659  				if rhat < prevRhat {
	 660  					break
	 661  				}
	 662  				// TODO(rsc): No need for a full mulWW.
	 663  				// x2 += vn2; if x2 overflows, x1++
	 664  				x1, x2 = mulWW(qhat, vn2)
	 665  			}
	 666  		}
	 667  
	 668  		// Compute q̂·v.
	 669  		qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0)
	 670  		qhl := len(qhatv)
	 671  		if j+qhl > len(u) && qhatv[n] == 0 {
	 672  			qhl--
	 673  		}
	 674  
	 675  		// Subtract q̂·v from the current section of u.
	 676  		// If it underflows, q̂·v > u, which we fix up
	 677  		// by decrementing q̂ and adding v back.
	 678  		c := subVV(u[j:j+qhl], u[j:], qhatv)
	 679  		if c != 0 {
	 680  			c := addVV(u[j:j+n], u[j:], v)
	 681  			// If n == qhl, the carry from subVV and the carry from addVV
	 682  			// cancel out and don't affect u[j+n].
	 683  			if n < qhl {
	 684  				u[j+n] += c
	 685  			}
	 686  			qhat--
	 687  		}
	 688  
	 689  		// Save quotient digit.
	 690  		// Caller may know the top digit is zero and not leave room for it.
	 691  		if j == m && m == len(q) && qhat == 0 {
	 692  			continue
	 693  		}
	 694  		q[j] = qhat
	 695  	}
	 696  
	 697  	putNat(qhatvp)
	 698  }
	 699  
	 700  // greaterThan reports whether the two digit numbers x1 x2 > y1 y2.
	 701  // TODO(rsc): In contradiction to most of this file, x1 is the high
	 702  // digit and x2 is the low digit. This should be fixed.
	 703  func greaterThan(x1, x2, y1, y2 Word) bool {
	 704  	return x1 > y1 || x1 == y1 && x2 > y2
	 705  }
	 706  
	 707  // divRecursiveThreshold is the number of divisor digits
	 708  // at which point divRecursive is faster than divBasic.
	 709  const divRecursiveThreshold = 100
	 710  
	 711  // divRecursive implements recursive division as described above.
	 712  // It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r.
	 713  // z must be large enough to hold ⌊u/v⌋.
	 714  // This function is just for allocating and freeing temporaries
	 715  // around divRecursiveStep, the real implementation.
	 716  func (z nat) divRecursive(u, v nat) {
	 717  	// Recursion depth is (much) less than 2 log₂(len(v)).
	 718  	// Allocate a slice of temporaries to be reused across recursion,
	 719  	// plus one extra temporary not live across the recursion.
	 720  	recDepth := 2 * bits.Len(uint(len(v)))
	 721  	tmp := getNat(3 * len(v))
	 722  	temps := make([]*nat, recDepth)
	 723  
	 724  	z.clear()
	 725  	z.divRecursiveStep(u, v, 0, tmp, temps)
	 726  
	 727  	// Free temporaries.
	 728  	for _, n := range temps {
	 729  		if n != nil {
	 730  			putNat(n)
	 731  		}
	 732  	}
	 733  	putNat(tmp)
	 734  }
	 735  
	 736  // divRecursiveStep is the actual implementation of recursive division.
	 737  // It adds ⌊u/v⌋ to z and overwrites u with the remainder r.
	 738  // z must be large enough to hold ⌊u/v⌋.
	 739  // It uses temps[depth] (allocating if needed) as a temporary live across
	 740  // the recursive call. It also uses tmp, but not live across the recursion.
	 741  func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) {
	 742  	// u is a subsection of the original and may have leading zeros.
	 743  	// TODO(rsc): The v = v.norm() is useless and should be removed.
	 744  	// We know (and require) that v's top digit is ≥ B/2.
	 745  	u = u.norm()
	 746  	v = v.norm()
	 747  	if len(u) == 0 {
	 748  		z.clear()
	 749  		return
	 750  	}
	 751  
	 752  	// Fall back to basic division if the problem is now small enough.
	 753  	n := len(v)
	 754  	if n < divRecursiveThreshold {
	 755  		z.divBasic(u, v)
	 756  		return
	 757  	}
	 758  
	 759  	// Nothing to do if u is shorter than v (implies u < v).
	 760  	m := len(u) - n
	 761  	if m < 0 {
	 762  		return
	 763  	}
	 764  
	 765  	// We consider B digits in a row as a single wide digit.
	 766  	// (See “Recursive Division” above.)
	 767  	//
	 768  	// TODO(rsc): rename B to Wide, to avoid confusion with _B,
	 769  	// which is something entirely different.
	 770  	// TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋.
	 771  	B := n / 2
	 772  
	 773  	// Allocate a nat for qhat below.
	 774  	if temps[depth] == nil {
	 775  		temps[depth] = getNat(n) // TODO(rsc): Can be just B+1.
	 776  	} else {
	 777  		*temps[depth] = temps[depth].make(B + 1)
	 778  	}
	 779  
	 780  	// Compute each wide digit of the quotient.
	 781  	//
	 782  	// TODO(rsc): Change the loop to be
	 783  	//	for j := (m+B-1)/B*B; j > 0; j -= B {
	 784  	// which will make the final step a regular step, letting us
	 785  	// delete what amounts to an extra copy of the loop body below.
	 786  	j := m
	 787  	for j > B {
	 788  		// Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits).
	 789  		// First make the 2-by-1-wide-digit guess using a recursive call.
	 790  		// Then extend the guess to the full 3-by-2 (see “Refining Guesses”).
	 791  		//
	 792  		// For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit,
	 793  		// we use a (2B+1)-by-(B+1) digit, which handles the possibility that
	 794  		// the result has an extra leading 1 digit as well as guaranteeing
	 795  		// that the computed q̂ will be off by at most 1 instead of 2.
	 796  
	 797  		// s is the number of digits to drop from the 3B- and 2B-digit chunks.
	 798  		// We drop B-1 to be left with 2B+1 and B+1.
	 799  		s := (B - 1)
	 800  
	 801  		// uu is the up-to-3B-digit section of u we are working on.
	 802  		uu := u[j-B:]
	 803  
	 804  		// Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n].
	 805  		qhat := *temps[depth]
	 806  		qhat.clear()
	 807  		qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps)
	 808  		qhat = qhat.norm()
	 809  
	 810  		// Extend to a 3-by-2 quotient and remainder.
	 811  		// Because divRecursiveStep overwrote the top part of uu with
	 812  		// the remainder r̂, the full uu already contains the equivalent
	 813  		// of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion.
	 814  		// Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder.
	 815  		// If that subtraction underflows, q̂·v > u, which we fix up
	 816  		// by decrementing q̂ and adding v back, same as in long division.
	 817  
	 818  		// TODO(rsc): Instead of subtract and fix-up, this code is computing
	 819  		// q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u.
	 820  		// But we can do the subtraction directly, as in the comment above
	 821  		// and in long division, because we know that q̂ is wrong by at most one.
	 822  		qhatv := tmp.make(3 * n)
	 823  		qhatv.clear()
	 824  		qhatv = qhatv.mul(qhat, v[:s])
	 825  		for i := 0; i < 2; i++ {
	 826  			e := qhatv.cmp(uu.norm())
	 827  			if e <= 0 {
	 828  				break
	 829  			}
	 830  			subVW(qhat, qhat, 1)
	 831  			c := subVV(qhatv[:s], qhatv[:s], v[:s])
	 832  			if len(qhatv) > s {
	 833  				subVW(qhatv[s:], qhatv[s:], c)
	 834  			}
	 835  			addAt(uu[s:], v[s:], 0)
	 836  		}
	 837  		if qhatv.cmp(uu.norm()) > 0 {
	 838  			panic("impossible")
	 839  		}
	 840  		c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv)
	 841  		if c > 0 {
	 842  			subVW(uu[len(qhatv):], uu[len(qhatv):], c)
	 843  		}
	 844  		addAt(z, qhat, j-B)
	 845  		j -= B
	 846  	}
	 847  
	 848  	// TODO(rsc): Rewrite loop as described above and delete all this code.
	 849  
	 850  	// Now u < (v<<B), compute lower bits in the same way.
	 851  	// Choose shift = B-1 again.
	 852  	s := B - 1
	 853  	qhat := *temps[depth]
	 854  	qhat.clear()
	 855  	qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps)
	 856  	qhat = qhat.norm()
	 857  	qhatv := tmp.make(3 * n)
	 858  	qhatv.clear()
	 859  	qhatv = qhatv.mul(qhat, v[:s])
	 860  	// Set the correct remainder as before.
	 861  	for i := 0; i < 2; i++ {
	 862  		if e := qhatv.cmp(u.norm()); e > 0 {
	 863  			subVW(qhat, qhat, 1)
	 864  			c := subVV(qhatv[:s], qhatv[:s], v[:s])
	 865  			if len(qhatv) > s {
	 866  				subVW(qhatv[s:], qhatv[s:], c)
	 867  			}
	 868  			addAt(u[s:], v[s:], 0)
	 869  		}
	 870  	}
	 871  	if qhatv.cmp(u.norm()) > 0 {
	 872  		panic("impossible")
	 873  	}
	 874  	c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv)
	 875  	if c > 0 {
	 876  		c = subVW(u[len(qhatv):], u[len(qhatv):], c)
	 877  	}
	 878  	if c > 0 {
	 879  		panic("impossible")
	 880  	}
	 881  
	 882  	// Done!
	 883  	addAt(z, qhat.norm(), 0)
	 884  }
	 885  

View as plain text