Integer multiplication in time O(n log n)

The Multiplication Principle

The Multiplication Principle. We can get some insight into why the formula holds by representing all options on a tree diagram. We can break.

The Role of Implicit Models in Solving Verbal Problems in

The model for multiplication was conjectured to be repeated addition and two primitive models (partitive and quotative) were seen as linked to division.

Number Sense and Numeration Grades 4 to 6

Developing Strategies for Multiplying Decimal Numbers . Appendix 3–1: Using Mathematical Models to Represent Multiplication.

An Investigation of Young Childrens Understanding of Multiplication

multiplication tasks. Consideration is also given to the difficulties experienced by unsuccessful children. This study looks at the development of

Multiplication Strategies and the Appropriation of Computational

In contrast we argue that

Modular Multiplication Without Trial Division

We present a method for multiplying two integers (called N-residues) multiplication without affecting the modular addition and subtraction algorithms.

Integer multiplication in time O(n log n) - Archive ouverte HAL

Nov 28 2020 Integer multiplication in time O(n log n). David Harvey and Joris van der Hoeven. Abstract. We present an algorithm that computes the ...

Conceptions in Making Sense of Multiplication

Hence teaching multiplication requires mathematics teachers to understand how the multiplication algorithm works before they can identify appropriate learning

Foundations of Multiplication and Division

Foundations of Multiplication and Division. 3.OA Conceptual Understanding Mini-Assessment by Students Achievement Partners. OVERVIEW.

Task Analysis for 2 Digit by 2 Digit Multiplication

Multiplication. Strategies. Skills Required. • Multiply 1s column. • Knowledge of place value. • How to multiply 1 digit numbers.

>G A/, ?H@ykydydd3 ?iiTb,ff?HXb+B2M+2f?H@ykydydd3pk

AMi2;2` KmHiBTHB+iBQM BM iBK2 PUM HQ; MV

hQ +Bi2 i?Bb p2`bBQM,

Integer multiplication in timeO(nlogn)

David Harvey and Joris van der Hoeven

Abstract.We present an algorithm that computes the product of twon-bit integers inO(nlogn) bit operations, thus conrming a conjecture of Schonhage and Strassen from 1971. Our complexity analysis takes place in the multitape Turing machine model, with integers encoded in the usual binary representa- tion. Central to the new algorithm is a novel \Gaussian resampling" technique that enables us to reduce the integer multiplication problem to a collection of multidimensional discrete Fourier transforms over the complex numbers, whose dimensions are all powers of two. These transforms may then be evaluated rapidly by means of Nussbaumer's fast polynomial transforms.

1.Introduction

LetM(n) denote the time required to multiply twon-bit integers. We work in the multitape Turing model, in which the time complexity of an algorithm refers to the number of steps performed by a deterministic Turing machine with a xed, nite number of linear tapes [35]. The main results of this paper also hold in the Boolean circuit model [41, Sec. 9.3], with essentially the same proofs. For functionsf(n1;:::;nk) andg(n1;:::;nk), we writef(n) =O(g(n)) to in- dicate that there exists a constantC >0 such thatf(n)6Cg(n) for all tuples n= (n1;:::;nk) in the domain off. Similarly, we writef(n) = (g(n)) to mean thatf(n)>Cg(n) for allnin the domain off, andf(n) = (g(n)) to indicate that bothf(n) =O(g(n)) andf(n) = (g(n)) hold. >From Section 2 onwards we will always explicitly restrict the domain offto ensure thatg(n)>0 throughout this domain. However, in this Introduction we will slightly abuse this notation: when writing for instancef(n) =O(nlognloglogn), we tacitly assume that the domain offhas been restricted to [n0;1) for some suciently large thresholdn0. Schonhage and Strassen conjectured in 1971 that the true complexity of inte- ger multiplication is given byM(n) = (nlogn) [40], and in the same paper es- tablished their famous upper boundM(n) =O(nlognloglogn). In 2007 their result was sharpened by Furer toM(n) =O(nlognKlogn) [12, 13] for some unspecied constantK >1, where logndenotes the iterated logarithm, i.e., log x:= minfk>0 : logkx61g. Prior to the present work, the record stood at

M(n) =O(nlogn4logn) [22].

The main result of this paper is a verication of the upper bound in Schonhage and Strassen's conjecture, thus closing the remaining 4 logngap: Theorem 1.1.There is an integer multiplication algorithm achieving M(n) =O(nlogn):Harvey was supported by the Australian Research Council (grant FT160100219). 1

2DAVID HARVEY AND JORIS VAN DER HOEVEN

If the Schonhage{Strassen conjecture is correct, then Theorem 1.1 is asymp- totically optimal. Unfortunately, no super-linear lower bound forM(n) is known. Perhaps the best available evidence in favour of the conjecture is the (nlogn) lower bound [6, 36] that has been proved for the \on-line" variant of the problem, in which thek-th bit of the product must be written before the (k+ 1)-th bits of the multiplicands are read. Again, the true complexity of on-line multiplication is not known: currently, the best known upper bound isO(nlognexp(Cploglogn)) forC=p2log2 +o(1) [29]. Theorem 1.1 has many immediate consequences, as many computational prob- lems may be reduced to integer multiplication. For example, the theorem implies that quotients andk-th roots of real numbers may be computed to a precision of nsignicant bits in timeO(nlogn), that transcendental functions and constants such asexandmay be computed to precisionnin timeO(nlog2n), and that the greatest common divisor of twon-bit integers may be found in timeO(nlog2n) [5]. Another interesting application is to the problem of computing DFTs (discrete Fourier transforms) overC. Given a transform lengthm>2 and a target accuracy ofp= (logm) bits, it was pointed out in [20, 25] that one may use Bluestein's trick [2] followed by Kronecker substitution [14, Corollary 8.27] to reduce a given DFT of lengthmto an integer multiplication problem of sizeO(mp). Theorem

1.1 then implies that the DFT may be evaluated in timeO(mplog(mp)). This

compares favourably with the traditional FFT (fast Fourier transform) approach, which requiresO(mlogm) operations inC, and thus timeO(mlogmM(p)) =

O(mplogmlogp) in the Turing model.

This faster method for computing DFTs overCleads to various further appli- cations. One such application is the conversion of ann-digit integer from one base to another, for example from binary to decimal, in timeO(nlog2n=loglogn) [30]. Alternatively, if one wishes to multiply twon-digit integers in a xed base6= 2, then it is possible to adapt the new algorithm to obtain a directO(nlogn)-time multiplication algorithm that works in basethroughout. This is asymptotically faster than reducing the problem to binary via the above-mentioned base conversion algorithms. All of the algorithms presented in this paper can be made completely explicit, and all implied big-Oconstants are in principle eectively computable. On the other hand, we make no attempt to minimise these constants or to otherwise exhibit a practical multiplication algorithm. Our aim is to establish the theoreticalO(nlogn) bound as directly as possible. We will actually describe two new multiplication algorithms. The rst one de- pends on an unproved hypothesis concerning the least prime in an arithmetic pro- gression. This hypothesis is weaker than standard conjectures in this area, but stronger than the best unconditional results currently available. We give only a brief sketch of this algorithm (see Section 1.2.1); a detailed treatment is given in the companion paper [24], which also presents an analogue of this algorithm for mul- tiplication inFq[x]. The bulk of the present paper (Sections 2{5) concentrates on working out the details of the second algorithm, which is technically more involved, but has the virtue of reaching theO(nlogn) bound unconditionally. In the remainder of Section 1, we review the literature on integer multiplication (Section 1.1), and give an overview of the new algorithms (Section 1.2).

Integer multiplication in timeO(nlogn) 3

1.1.Survey of integer multiplication algorithms.The rst improvement on

the classicalM(n) =O(n2) bound was found by Karatsuba in 1962. Signicant progress was made during the 1960s by Toom, Cook, Schonhage and Knuth; see [25, Sec. 1.1] for further historical details and references for this period. FFTs were brought into the picture by Schonhage and Strassen [40] soon after the publication of the FFT by Cooley and Tukey [7]; see [28] for more on the history of the FFT. The multiplication algorithms published since [40] may be roughly classied into four families: (1)Schonhage{Strassen's rst algorithm[40] is, in retrospect, the most straight- forward FFT-based integer multiplication algorithm imaginable. By splitting the n-bit multiplicands into chunks of size (logn), they reduce to the problem of mul- tiplying polynomials inZ[x] of degree (n=logn) and coecient size (logn). The product inZ[x] is handled by means of FFTs overC, i.e., evaluating the polynomi- als at suitable complex roots of unity, multiplying their values pointwise inC, and then interpolating to obtain the product polynomial. Elements ofCare represented approximately, with a precision of (logn) bits. Arithmetic operations inC(such as multiplication) are reduced to arithmetic inZby scaling by a suitable power of two. This leads to the recursive estimate

M(n) =O(nM(n0)) +O(nlogn); n0=O(logn);

whose explicit solution is

M(n) =O(Klognnlognloglognlog((logn)1)n)

for some constantK >0. The algorithm achieves an exponential size reduction at each recursion level, fromntoO(logn), and the number of levels is logn+O(1). Pollard suggested a similar algorithm at around the same time [37], working over a nite eld rather thanC. He did not analyse the bit complexity, but with some care one can prove essentially the same complexity bound as for the complex case (some technical diculties arise due to the cost of nding suitable primes; these may be resolved by techniques similar to those discussed in [25, Sec. 8.2]). (2)Schonhage{Strassen's second algorithmis the more famous and arguably the more ingenious of the two algorithms presented in [40]. It is probably the most widely used large-integer multiplication algorithm in the world today, due to the highly optimised implementation included in the free GNU Multiple Precision Arithmetic Library (GMP) [17, 15], which underlies the large-integer capabilities of all of the major contemporary computer algebra systems. The basic recursive problem is taken to be multiplication inZ=(2n+1)Z, wheren is a power of two. Letn0:= 2d(log22n)=2e= (n1=2) andT:= 2n=n0= (n1=2), so that (n0)22 f2n;4ngandTjn0; then by splitting the inputs into chunks of sizen0=2, the problem is reduced to multiplication inR[x]=(xT+1) whereR:=Z=(2n0+1)Z. The powers of 2 inRare sometimes called \synthetic" roots of unity, as they have been synthesised algebraically, or \fast" roots of unity, as one can multiply an element ofRby an arbitrary power of 2 in linear time, i.e., in timeO(n0). Consequently, for!:= 2n0=T, one may evaluate a polynomial at!;!3;:::;!2T1 (the roots ofxT+1) via the FFT in timeO((n0logn0)n0) =O(nlogn). The original multiplication problem is thus reduced toTpointwise multiplications inR, which are handled recursively. WritingM1(n) for the cost of a product inZ=(2n+ 1)Z,

4DAVID HARVEY AND JORIS VAN DER HOEVEN

one obtains the recurrence (1.1)M1(n)<2nn

0M1(n0) +O(nlogn); n0=O(n1=2):

Unlike the rst Schonhage{Strassen algorithm, this algorithm performs only ageo- metricsize reduction, fromntoO(n1=2), at each recursion level, and the number of recursion levels is log

2logn+O(1) =O(loglogn).

The constant 2 in (1.1), which arises from zero-padding in the initial splitting stage, plays a crucial role in the complexity analysis: it ensures that at each re- cursion level, the total cost of the \fast" FFTs remainsO(nlogn), with the same implied constant at each level. The overall cost is thusM1(n) =O(nlognloglogn). (3)Furer's algorithm[12, 13] combines the best features of the two Schonhage{ Strassen algorithms: the exponential size reduction from the rst algorithm, and the fast roots of unity from the second one. The overall strategy is similar to the rst algorithm, but instead of working overC, one uses a bivariate splitting to reduce to a polynomial multiplication problem overR:=C[y]=(yr+ 1), where r= (logn) is a power of two. This ring contains a synthetic root of unityyof order 2r, but also inherits higher-order roots of unity fromC. Elements ofCare represented approximately, with a precision ofO(logn) bits; thus an element ofR occupiesO((logn)2) bits. Furer's key insight is to apply the Cooley{Tukey FFT decomposition in radix 2r instead of radix two. He decomposes each \long" transform of length (n=(logn)2) into many \short" transforms of length 2r, with one round of expensive \twiddle factor" multiplications interposed between each layer of short transforms. The short transforms take advantage of the synthetic roots of unity, and the twiddle factor multiplications are handled recursively (via Kronecker substitution). This leads to the recurrence

M(n) =Onlognn

0logn0M(n0)

+O(nlogn); n0=O((logn)2); and then to the explicit boundM(n) =O(nlognKlogn) for some constantK >1. Furer did not give a specic value forK, but it is argued in [25, Sec. 7] that careful optimisation of his algorithm leads to the valueK= 16. Several authors have given variants of Furer's algorithm that also achieveM(n) = O(nlognKlogn), using essentially the same idea but working over dierent rings. De, Kurur, Saha and Saptharishi [10] replaceCby ap-adic ringQp; this has the benet of avoiding numerical analysis overC, but the value ofKbecomes somewhat larger. Covanov and Thome give another variant that achievesK= 4, conditional on a conjecture on the distribution of generalised Fermat primes [8]. (4)The Harvey{van der Hoeven{Lecerf algorithm[25] follows Furer in decom- posing a \long" transform into many \short" transforms of exponentially smaller length. However, instead of working over a ring containing fast roots of unity, one works directly overC(as in the rst Schonhage{Strassen algorithm), and converts the short transforms back to multiplication problems via Bluestein's trick [2]. These short products are then handled recursively. The rst version given in [25] achievedM(n) =O(nlognKlogn) withK= 8. The value ofKwas improved gradually over a sequence of papers [18, 19, 21], reachingK= 4 in [22]. All of these algorithms perform exponential size reduction, and the number of recursion levels is log n+O(1).

Integer multiplication in timeO(nlogn) 5

An interesting feature of these algorithms | related to the fact that they dispense with the need for fast roots of unity | is that they can be adapted to prove bounds of the formO(nlognKlogn) for the cost of multiplying polynomials inFq[x] of degreen(for xedq). This was rst established with the constantK= 8 in [26], and improved toK= 4 in [23]. As mentioned previously, the rst of the two new algorithms presented in this paper may be adapted to obtain anO(nlogn) bound for theFq[x] case [24], but unfortunately this result is still conditional and so does not yet supersede the unconditionalO(nlogn4logn) bound given in [23].

1.2.Overview of new algorithms.Our new algorithms are motivated by the

observation that certainmultivariatepolynomial rings admit particularly ecient multiplication algorithms. Letrbe a power of two, and ford>2 consider the ring (1.2)R[x1;:::;xd1]=(xt111;:::;xtd1 d11); R:=C[y]=(yr+ 1); wheretij2rfor alli. One may multiply in this ring by rst using FFTs to evaluate eachxiat the syntheticti-th roots of unity (the powers ofy2r=ti), then multiplying pointwise inR, and nally performing inverse FFTs. Such transforms were studied extensively by Nussbaumer in the late 1970s (see for example [32]), and are sometimes known asfast polynomial transforms. They consist entirely of additions and subtractions inC, and require no multiplications inCwhatsoever. In Sections 1.2.1 and 1.2.2 below, we outline two dierent ways of fashioning an integer multiplication algorithm from the polynomial multiplication algorithm just described. The key issue is to show how to transport an integer multiplication problem, which is intrinsically one-dimensional, to a ring of the type (1.2). In both cases, we begin with the following setup. Suppose that we wish to multi- ply twon-bit integers. We choose a dimension parameterd>2 and distinct primes s

1;:::;sd(n=logn)1=d, subject to certain conditions that will be explained in

Sections 1.2.1 and 1.2.2. Just as in the rst Schonhage{Strassen algorithm, we split the inputs into aroundn=lognchunks of roughly lognbits, thereby reducing the problem to multiplication inZ[x]=(xs1sd1). Now, following a technique described by Agarwal and Cooley [1] (which is closely related to the Good{Thomas FFT algorithm [16, 42]), we observe that the Chinese remainder theorem induces an isomorphismZ[x]=(xs1sd1)=Z[x1;:::;xd]=(xs111;:::;xsdd1), so the problem amounts to computing a product in the latter ring. For this, it suces to show how to eciently compute a multidimensional complex DFT of sizes1 sd, i.e., with respect to the complexsi-th roots of unity, to an accuracy ofO(logn) bits.

1.2.1.A conditional algorithm | Rader's trick.Suppose that we are able to choose

the primess1;:::;sdso thatsi= 1 (modr), wherer>2 is a power of two, and where thesiare not much larger thanr. We may then deploy a multidimensional generalisation ofRader's algorithm[38] to reduce the given DFT of sizes1sd to a multiplication problem in the ringC[x1;:::;xd]=(xs11

11;:::;xsd1

d1) (together with some lower-dimensional multiplication problems of negligible cost). Crucially, the convolution lengths have been reduced fromsitosi1. Writing s i1 =qir, where theqiare \small", we may further reduce this product to a collection of complex DFTs of sizeq1 qd, plus a collection of multiplication problems inC[x1;:::;xd]=(xr11;:::;xrd1). After replacingxdwithei=ry, we see that the latter products are exactly of the type (1.2). As discussed previously, we may use synthetic FFTs to reduce such a product to a collection of pointwise

6DAVID HARVEY AND JORIS VAN DER HOEVEN

products inR=C[y]=(yr+1). These in turn are converted to integer multiplication problems via Kronecker substitution, and then handled recursively. The main sticking point in the above algorithm is the cost of the auxiliary DFTs of sizeq1qd. There are various options available for evaluating these DFTs, but to ensure that this step does not dominate the complexity, the key issue is to keep the size of theqiunder control. What we are able to prove is the following. For positive, relatively prime integersranda, dene

P(a;r):= minfs >0 :sprime ands=amodrg;

and putP(r):= maxaP(a;r).Linnik's theoremstates that there is an absolute constantL >1 such thatP(r) =O(rL). (In our application, we are interested in boundingP(1;r) whenris a power of two.) The best published value forLis currentlyL= 5:18 [43], and under the Generalised Riemann Hypothesis one may takeL= 2+"for any" >0 [27]. In the companion paper [24], we present an integer multiplication algorithm following the plan just described, but working over a nite eld instead ofC. We prove that if Linnik's theorem holds for someL <1 +1303 and if we takednear 106, then the cost of the auxiliary DFTs can be controlled and one does in fact obtain an overallM(n) =O(nlogn) bound. We expect that the same argument works overC, with a possibly dierent threshold forL, but we have not worked out the details. On the other hand, it is widely expected that the boundP(r) =O(rL) should hold foranyL >1. For this reason, we strongly suspect that the algorithm sketched above does run in timeO(nlogn), despite us being unable to supply a proof. For further discussion, and examples of even stronger bounds forP(r) that are expected to hold, see [24]. Remark1.2.The idea of evaluating a multidimensional transform via a combina- tion of Rader's algorithm and polynomial transforms was previously suggested in a dierent context by Nussbaumer and Quandalle [33, p. 141].

1.2.2.An unconditional algorithm | Gaussian resampling.The rest of the paper

is devoted to the second method. Here we choose the primess1;:::;sdin such a way that eachsiis slightly smaller than a power of twoti, and so thatt1td= O(s1sd). Finding such primes is easily accomplished using Eratosthenes' sieve and the prime number theorem with a suitable error term (see Lemma 5.1). Assume as before that we wish to compute a complex multidimensional DFT of sizes1 sd, to an accuracy ofO(logn) bits. Our key innovation is to show thatthis problem may be reduced directly to the problem of computing a complex multidimensional DFT of sizet1 td. The idea of the reduction is as follows. Suppose that we are given as input an s

1 sdarray of complex numbersu= (uj1;:::;jd)06ji array as lying inside thed-dimensional unit torus (R=Z)d: we imagine the coecient u j1;:::;jdto be plotted at coordinates (j1=s1;:::;jd=sd) in the torus (see Figure 1). We construct fromuan intermediatet1tdarrayv= (vk1;:::;kd)06ki s1sd. Provided that the ratiosti=siare not too close to 1, we will show that this system may be solved in an ecient and numeri- cally stable manner, and that we may therefore recover ^ufrom ^v. This procedure forms the core of our \Gaussian resampling" method, and is developed in detail in Section 4. It is closely related to the Dutt{Rokhlin algorithm for non-equispaced FFTs [11]; see Section 4.4.3 for a discussion of the similarities and dierences. We have therefore reduced to the problem of computing ^vfromv, and we are free to do this by any convenient method. Note that this is a DFT of sizet1 td rather thans1 sd. In Section 3 we will show how to use a multivariate generalisation of Bluestein's algorithm [2] to reduce this DFT to a multiplication problem in a ring of the form (1.2). As already pointed out, such a product may be handled eciently via synthetic FFTs; the details of this step are also discussed in Section 3. Analysis of this algorithm leads to a recurrence inequality of the form (1.3)M(n)0M(n0) +O(nlogn); n0=n1d

+o(1); whereKis an absolute constant, and in particular does not depend ond. (In Section 5 we establish (1.3) with the explicit constantK= 1728, and in Section 5.4 we list some optimisations that improve it toK= 8.) The rst term arises from pointwise multiplications in a ring of the typeR=C[y]=(yr+ 1), and the second term from the fast FFTs and other auxiliary operations, including computingv fromuand recovering ^ufrom ^v. We stress here the similarity with the corresponding bound (1.1) for the second Schonhage{Strassen algorithm; the dierence is that we are now free to choosed. In Section 5, we will simply taked:= 1729 (any constant larger thanKwould do), and then it is easy to see that (1.3) implies thatM(n) =O(nlogn). (A similar

8DAVID HARVEY AND JORIS VAN DER HOEVEN

analysis holds for the conditional algorithm sketched in Section 1.2.1, for dierent values ofKandd.) It is striking that for xedd, the new algorithm performs only a geometric size reduction at each recursion level, just like the second Schonhage{Strassen algo- rithm, and unlike the rst Schonhage{Strassen algorithm or any of the post-Furer algorithms. In the new algorithm, the total cost of the FFTs actuallydecreases by the constant factord=K >1 at each subsequent recursion level, unlike in the second Schonhage{Strassen algorithm, where it remains constant at each level, or any of the other algorithms mentioned, where it increases by a constant factor at each level. Actually, it is possible to allowdto grow withn, so as to achieve size reduction faster than geometric. With some care, this leads to a better constant in the main O(nlogn) bound, by shifting more work from the pointwise multiplications into the fast FFTs. We will not carry out the details of this analysis. Finally, we mention that our reduction from a DFT of sizes1sdto one of sizet1 tdis highly non-algebraic, and depends heavily on the archimedean property ofR. Consequently, we do not know how to give an analogue of this algorithm for multiplication inFq[x]. Acknowledgments.The authors would like to thank the anonymous referees, whose thoughtful comments helped to improve the presentation of these results.

2.DFTs, convolutions and fixed-point arithmetic

In the Turing model we cannot compute with elements ofCexactly. In this section we introduce a technical framework for systematic discussion of DFTs and convolutions in the setting of xed-point arithmetic. This framework is loosely based on the presentation in [25, Sec. 3], and will be used throughout the rest of the paper. The impatient reader may skip most of the section and use it as a reference in case of doubt. To this eect, Table 1 contains a summary of the notation introduced in this section.

2.1.Integer arithmetic.Integers are assumed to be stored in the standard bi-

nary representation. We brie y recall several well-known results concerning integer arithmetic; see [5, Ch. 1] for further details and literature references. Letn>1, and assume that we are given as inputx;y2Zsuch thatjxj;jyj62n. We may computex+yandxyin timeO(n). For multiplication, we will often use the crude estimateM(n) =O(n1+), where for the rest of the paperdenotes a small, xed positive quantity; for deniteness, we assume that <18 . Ify >0, then we may compute the quotientsbx=ycanddx=yein timeO(n1+). More generally, for a xed positive rational numbera=b, and assumingx;y >0, we may compute b(x=y)a=bcandd(x=y)a=bein timeO(n1+).

2.2.Fixed-point coordinate vectors.Fix a precision parameterp>100. Let

C :=fu2C:juj61gdenote the complex unit disc, and set

C:= (2pZ[i])\C=f2p(x+ iy) :x;y2Zandx2+y2622pg:

In the Turing model, we represent an elementz= 2p(x+ iy)2~Cby the pair of integers (x;y). It occupiesO(p) bits of storage, asjxj;jyj62p. The precisionpis always known from context and does not need to be stored alongsidez.

Integer multiplication in timeO(nlogn) 92(0;18

) a constant such thatM(n) =O(n1+) (§2.1) p>100 working precision in bits (§2.2) kksupremum norm onV(a nite-dimensional vector space overC) with respect to a specied basisBV(§2.2) V closed unit ball inVunder the supremum norm (§2.2) ~Vset of vectors inVwhose coordinates are xed-point complex numbers withpbits of accuracy(§2.2) :V!~Vround-towards-zero function (§2.2) ~v2~Va xed-point approximation to a vectorv2V(§2.2) "(~v) (scaled) error incurred in approximatingvby ~v(§2.2) F n:Cn!Cncomplex DFT of lengthn(§2.4) F n1;:::;ndmultidimensional complex DFT of sizen1 nd(§2.4) Rthe ringC[y]=(yr+ 1), for a power of twor>2 (§2.3) G n:Rn!Rnsynthetic DFT of lengthn, wherenj2r(§2.4) G n1;:::;ndmultidimensional synthetic DFT of sizen1 nd(§2.4) uvpointwise product of vectors/tensors (§2.4) uvconvolution product of vectors/tensors (§2.4) Aa linear (or bilinear) map between nite-dimensional complex vector spaces(§2.6) kAkoperator norm ofA(§2.6) ~Aa \numerical approximation" forA(assumingkAk61), i.e., a computable function intended to approximateA on xed-point inputs(§2.6) "(~A) worst-case error incurred in approximatingAby~A(§2.6)

C(~A) worst-case cost of evaluating~Aon a single vector (§2.6)Table 1.Glossary of notation introduced in Section 2

We dene a round-towards-zero function:C!Cas follows. First, dene

0:R!Zby0(x):=bxcforx>0, and0(x):=dxeforx <0. Then dene

0:C!Z[i] by setting0(x+ iy):=0(x) + i0(y) forx;y2R. Observe that

j0(u)j6jujandj0(u)uj1. For the special caseV=Cm, we always take the standard basis; in particular, forV=Cthe basis is simplyf1g. We dene a normkk:V![0;1) in terms of the basisBVby setting k0b0++m1bm1k:= maxjjjj; j2C: This norm satisesku+vk6kuk+kvkandkuk=jjkukfor anyu;v2V,2C.

The unit ball inVis dened to be

V :=fu2V:kuk61g=f0b0++m1bm1:j2Cg;

10DAVID HARVEY AND JORIS VAN DER HOEVEN

and we also dene

V:=f0b0++m1bm1:j2~Cg:

We extendto a function:V!Vby acting componentwise, i.e., we put (0b0++m1bm1):=X j(j)bj; j2C: Thenk(u)k6kukandk(u)uk1. Given as inputu;v2~V, in timeO(mp)we may compute an approximation~w2~Vfor w :=12 (uv)2Vsuch that"( ~w)<1. Proof.Consider rst the casem= 1, i.e., assume thatV=C. Letu= 2paand v= 2pbwherea;b2Z[i] andjaj;jbj62p. Since the denominators of the real and imaginary parts of 2 pw=12 (ab) are at most 2, we havej0(2pw)2pwj6 12 )2+(12 )2)1=2=1p2 . Dene ~w:=(w) = 2p0(2pw). We may clearly compute ~w in timeO(p), and"( ~w) = 2pk(w)wk61p2 <1. The general case (m>1) follows by applying the same argument in each coordinate. Occasionally we will encounter a situation in which we have computed an approx- imation ~u2~Vfor someu2V, and we wish to compute an approximation forcu, wherec>1 is a xed integer scaling factor for which it is known thatcu2V. A typical example is the nal scaling step in an inverse FFT. Unfortunately, thequotesdbs_dbs47.pdfusesText_47

[PDF] Integer multiplication in time O(n log n) - Archive ouverte HAL