Double-Speed Barrett Moduli PDF constituents of most public-key

Double-Speed Barrett Moduli

constituents of most public-key cryptosystems. Amongst the numerous As an example of the proposed techniques the Elliptic Curve Digital Signature.

Digital Signature Standard (DSS)

Approved cryptographic algorithms and techniques include those that are ANS X9.31-1998 Digital Signatures Using Reversible Public Key Cryptography for ...

FIPS 186-3 Digital Signature Standard (DSS)

03-Jun-2009 Approved cryptographic algorithms and techniques include those that are ... ANS X9.31-1998 Digital Signatures Using Reversible Public Key ...

Fast Multiparty Threshold ECDSA with Fast Trustless Setup

A threshold signature scheme enables n parties to share the power to issue digital signatures under a single public key. A threshold t is specified such that

Practical Byzantine Fault Tolerance

A Method for. Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM 21(2)

Modular Multiplication Without Trial Division Author(s): Peter L

SHAMIR & L. ADLEMAN "A method for obtaining digital signatures and public-key cryptosystems

CacheBleed: A Timing Attack on OpenSSL Constant Time RSA

A method for obtaining digital signatures and public-key cryptosystems. CACM 21:120–126

Responses to NISTs proposal

Who Holds t:he Keys? I. NIST's Proposal he U.S. Government agency NIST has recently proposed a public key digital signature standard [

Survey of Computational Assumptions Used in Cryptography Broken

public-key cryptosystem each person gets a pair of keys

Addressing Weaknesses in the Domain Name System Protocol

A Method for Obtaining Digital. Signatures and Public Key Cryptosystems. Communications of the ACM. 21 2 :120 6

Double-Speed Barrett Moduli

Rémi Géraud, Diana Maimuţ, and David Naccache

École normale supérieure

Équipe de cryptographie, 45 rue d"Ulm,f-75230 Pariscedex 05, France given_name.family_name@ens.fr Abstract.Modular multiplication and modular reduction are the atomic constituents of most public-key cryptosystems. Amongst the numerous algorithms for performing these operations, a particularly elegant method was proposed by Barrett. This method builds the operationamodbfrom bit shifts, multiplications and additions inZ. This allows to build modular reduction at very marginal code or silicon costs by leveraging existing hardware or software multipliers. This paper presents a method allowing to double the speed of Barrett"s algorithm by using specific composite moduli. This is particularly useful for lightweight devices where such an optimization can make a difference in terms of power consumption, cost and processing time. The generation of composite moduli with a predetermined portion is a well-known technique and the use of such moduli is considered,in statu scientiae, as safe as using randomly generated composite moduli.

1 Introduction

Modular multiplication and modular reduction are the atomic constituents of most public-key cryptosystems. Amongst the numerous algorithms for performing these operations (e.g.[3,4,9,12]), a particularly elegant method was proposed by Barrett in [ 1 ]. This method assembles the operationamodbfrom bit shifts, multiplications and additions inZ. This allows to build modular reduction at very marginal code or silicon costs by leveraging existing hardware or software multipliers. For a very detailed comparison of the principal modular reduction strategies, we refer the reader to [ 3 This paper presents a method allowing to double the speed of Barrett"s algorithm by using specific composite moduli. This is particularly useful for lightweight devices where such an optimization can make a difference in terms of power consumption, cost and processing time. The generation of composite moduli with a predetermined portion is a well-known technique [ 6 10 17 ] and the use of such moduli is considered,in statu scientiae, as safe as using randomly generated composite moduli.

Related work:

Douguet and Dupaquis [

5 ] describe a modified Barrett modular reduction algorithm whose purpose is the acceleration of this type of operation in certain (elliptic curve) groups of known moduli. Thus, the approach they consider implies moduli with a given form,e.g.the recommended ones from [13]. Estimations of the speed-ups are not provided, but the resistance of various architectures to different physical attacks is discussed. A general form of the Barrett constant and of the quotients (when certain moduli are used) are described. As an example of the proposed techniques, the Elliptic Curve Digital Signature

Algorithm (ECDSA) [

14 ] is taken into account. We stress that no specific modulus generation algorithm is presented in [ 5

The approach of [

5 ] is rather a practical one, whereas our goal is to provide formal mathematical models for moduli with a predetermined portion generation.

KneŽević, Batina and Verbauwhede [

7 ] propose two sets of moduli for which Barrett"s modular reduction algorithm can be implemented by avoiding the pre- computation of the Barrett constant. The types of moduli considered throughout this paper do not fall into those sets.

Structure of the paper:

Section

2 starts b yin troducingnotations and d e- scribing Barrett"s original algorithm. Section 3 recall sbac kgroundconcernin g composite moduli a predetermined portion. Section 4 in troducesour core ide a, that leverages Section 3 to generate Barrett-friend lyRSA mo duli.In Section 5 we apply this idea to other cryptographic primitives, such as DSA [ 14

2 Barrett"s Algorithm

For a givena, let?a?= 1 +?log2a?=?log2(a+ 1)?. That is,?a?will denote the bit-length ofathroughout this paper.a|bwill represent the concatenation of the bit-stringsaandb. x?ywill denote binary shift-to-the-right ofxbyyplacesi.e.: x?y=?x2 y?

Barrett"s algorithm (Algorithm

1 ) approximates the resultc=dmodnby The algorithm uses the pre-computed constantκ=?2L/n?that depends only on nandL. The reader is referred to [1] for a proof and a thorough analysis of this algorithm.

Work Factor:?c1?

=D-N+ 1?D-Nand?κ?=L-Nhence their product requiresw= (D-N)(L-N)elementary operations.?c3?= (D- N ) + (L-N)-(L-N+ 1) =D-N-1?D-N. The productnc3will therefore claimw?= (D-N)Nelementary operations. All in all, work amounts tow+w?= (D-N)(L-N) + (D-N)N= (D-N)L. The goal of this paper is to halve this work factor. Algorithm 1:Barrett"s AlgorithmInput:n <2N,d <2D,κ=? 2Ln

Output:c=dmodn

1c1←d?(N-1);

2c2←c1κ;

3c3←c2?(L-N+ 1);

4c4←d-nc3;

5whilec4≥ndo6c4←c4-n;

7end while

8returnc43 Moduli with a Predetermined PortionRSA [15] moduli with a predetermined portion are used to reduce storage

requirements or computations. As mentioned before, such moduli are presently not known to be cryptographically weaker than randomly chosen ones. The first techniques for generating composite moduli were proposed by Vanstone and

Zuccherato [

17 ofn. Lenstra [10] proposed more advanced techniques for specifying up toN/2 bits. Based on Lenstra"s algorithms, Joye proposed new techniques in [ 6 ]. Further works in the area include, for instance, [ 8 11 16 ]. We will hereafter recall the folklore method described by Joye (Algorithm 2 ), that perfectly fits our purpose1. Folklore method.The purpose of the folklore technique recalled by Joye is to obtain an RSA modulusnwith a predetermined leading partnh. Letting ?nh?=H, we have: n=nh2N-H+n?,for some0< n?<2N-H(1) The algorithm uses the functionNextPrime(x)that returns the prime follow- ingx(ifxis prime thenx=NextPrime(x)). Note that because the gap between xandNextPrime(x)is unpredictable, the algorithm may fail to return annof the formn=nh2N-H+n?and will have to be re-launched. We refer the reader to [ 10 ] for a more formal analysis of this process.

Lemma 1

(Bounding nandω).

Consider the parameters used in Algorithm

2 and letm=q-ω. Then,n < nh2N-H+ (1 +m)(2N-H-1)andω <2H+1+ 1.

Proof.By definition:

ω=?ηp

? ?α < psuch thatω=ηp +αp 1 For the sake of clarity we remove all tests meant to enforce the condition

GCD(e,φ(n)) = 1.

τ01234

?¯nh?503502501500499 success rate85.66%97.96%99.96%100%100% Fig.1.Success rates of Algorithm2 for N= 1024,H= 512and104experiments.

Substituting the value ofη, we get:

ω=nh2N-Hp

+αp ?q=ω+m=nh2N-Hp +αp +m Thus: n=pq=nh2N-H+α+mp < nh2N-H+(1+m)p < nh2N-H+(1+m)(2N-H-1)

And upper boundingωwe get:

ηp + 1 =nh2N-Hp + 1N-H-1.??

It follows directly from Lemma

1 that: Applying the Prime Number Theorem, we find thatm?ln(2H+1+ 1)?

0.7(H+ 1). In other words, thelog2(m+ 1)?log2(0.7H+ 1.7) significant bits ofnhare likely to get polluted. We hence rectify the size ofnh toH-τ-log2Hwhereτ?Nis a parameter allowing to reduce the failure probability of Algorithm 2 at the cost of further shortening nh. For the sake of clarity, we do not integrate these fine-tunings in the description of Algorithm 2 but consider thatnhis composed of a "real" prescribed pattern¯nhof size H-τ- ?log2H?bits right-padded withτ+?log2H?zero bits. Various success rates forN= 1024,H= 512are given in Figure1 . Based on those we recommend to setτ= 0orτ= 1and re-launch the generation process if the algorithm fails. Note:The algorithm"s theoretical analysis could be simplified and the failure rate improved if step (4) of Figure 1 is replaced b y:"If ωis composite then goto
1; elseq←ω". The quality of the generated primes will also become theoretically
uniform because NextPrime favors primespiwhose distance from the previous primepi-1is large. This modification will, however, come at the cost of more computation time. The same note is applicable to Algorithm 3 as w ell.
Output:n=nh2N-H+n?, such that0< n?<2N-H

1Generate a random primep, such that2N-H-1< p <2N-H-1;

2η←nh2N-H;

3ω←?ηp

4q←NextPrime(ω);

5n←pq;

6returnnAlgorithm 3:Barrett-friendly modulus generatorInput:L= 2N= 4U
Output:n, an RSA modulus such that2N-1< n <2N-1+ (0.7U+ 2)(2U-1) whose associatedκis such that2N+1-2U+1(1 + 0.7U)< κ <2N+1
1Generate a random integerrsuch that2U-1< r <2U-1;

2η←2N-1+r;

3Generate a random primepsuch that2U-1< p <2U-1;

4ω←?ηp

5q←NextPrime(ω);

6n←pq;

7returnn4 Barrett-Friendly ModuliWe note that both multiplications in Algorithm1 are m ultiplicationsb yconstants.
Namely bynandκ. It is known (e.g.[2]) that multiplications by constants can be performed faster than multiplications by arbitrary integers. Our goal is to generate1a compositen2whose leading bits do not need to be multiplied and3 whose associatedκalso features a most significant part that does not need to be multiplied. As for the least significant parts ofnandκ, these are constants and can henceindependentlybenefit of speedup techniques such as [2]. The algorithm is given for the very common settingL=D= 2N. For convenience we introduce a bitlength unitUsuch thatL= 2N= 4U.
Example 1.LetN= 100andL= 200:
p=322a28626f0a7ω=28d356763fe4a
κ=1ffffffffffd5cdb3e394fe440

Lemma 2.If0< x <2P/2-1, then?
22P2

P-1+x?
= 2
P+1-4x.

Proof.Observe that:
2 2P2
P-1+x-(2P+1-4x) =4x22

P-1+x.(2)

Furthermore,
4x22 P-1+x<1?4x2-x <2P-1This is a polynomial of degree2, that has one positive and one negative root. We assumedx >0, therefore we only need to consider the positive rootxmax: x max=18
1 +?1 + 2
P+4? >2P/2-1 Therefore, ifx <2P/2-1, then the fraction in eq. (2) is smaller than one. As a consequence, we have ?22P2
P-1+x-(2P+1-4x)?
=?22P2
P-1+x?
-(2P+1-4x) = 0, as2P+1-4xis an integer.??
Lemma 3
(Bounding n,ωandκin Algorithm3 ).
Consider the parameters
used in Algorithm 3 and let m=q-ω. Then,n <2N-1+ (2 +m)(2U-1),
2N+1-2U+1(1 +m)< κ <2N+1andω <2U+ 2.

Proof.By definition:

ω=?ηp
, thus?α < psuch thatω=ηp +αp
Substituting the value ofη, we get:

ω=2N-1+rp
+αp ?q=ω+m=2N-1p +rp +αp +m. Thus: n=pq= 2N-1+r+α+mp n <2N-1+r+(1+m)p <2N-1+2U-1+(1+m)(2U-1)<2N-1+(2+m)(2U-1).
We observe that
2
N-1+r+α+mp≥12

N-1+r+mp.

Boundingκwe obtain:

κ=?2Ln
>2Ln -1≥2L2
N-1+r+mp-1,
Now observe thatr+mp <2N-1, therefore we can write 2 L2
N-1+r+mp=22N2

N-1+r+mp= 2N+111 + 2

1-N(r+mp)= 2N+1∞?

?=0(-2)?(1-N)(r+mp)?This series is convergent, alternating, and the term is strictly decreasing, therefore
its sum is bounded below (resp. above) by the partial sum of odd (resp. even) degreeS?. As a consequence,
κ > S

1-1 = 2N+1?1-21-N(r+pm)?-1 = 2N+1-4(r+pm)-1>2N+1-2U+1(1+m).

We observe that
2
N-1+r+α+mp >2N-1?12

N-1+r+α+mp<12
N-1. Thus: =2L2
N-1+r+α+mp<2L2

N-1<2N+1.

Upper boundingωwe get:
ηp + 1 =2N-1+rp + 1<2N-1+ 2U-12
U-1+ 1 = 2N-1-U+1+ 1 + 1<2U+ 2.
Note that the most significant bit ofpmust be set to 1,i.e.2U-1< p <2U-1.
It follows directly from Lemma
3 that: Letnhdenote the predetermined portion ofn,i.e.nh= 2U-1. Applying the Prime Number Theorem, we obtainm?ln(2U+ 1)?0.7U. Put differently, the log2(m+ 2)?log2(0.7U+ 2)Figure

2 . Based on those we recommend to setτ= 0orτ= 1and re-launch the generation process if the algorithm fails. It is easy to see that multiplication by bothnandκis not costly at all. To be more specific,nandκsatisfy the inequalities: 2 N-1< n <2N-1+(0.7U+2)(2U-1)and2N+1-2U+1(1+0.7U)< κ <2N+1. As a result, this can double the speed of Barrett reduction 2.2 A few more complexity bits can be grabbed if the variant described in the note at the end of section 3 is used.

τ01234

?¯nh?503502501500499 success rate85.16%97.51%99.91%100%100%

Output:P arameters( p,q).

1Choose aQ-bit primeq;

2Choose aP-bit prime moduluspsuch thatp-1is a multiple ofq;

3return(p,q)5 ExtensionsThe parameter generation phase of DL cryptosystems requires the generation

of two primes (e.g.pandq). Computations modulo these two primes represent important steps within the algorithms. Thus, a modular reduction speedup is necessary. It is thus desirable that bothpandqto contain significantly long patterns (i.e.many successive 1s or 0s). We will now propose a Barrett-friendly parameter generation approach to do so. For the sake of clarity, we choose a particular algorithm to describe our method: the Digital Signature Algorithm (DSA).

5.1 Barrett-Friendly DSA Parameters Generation

DSA"s parameter generation is presented in Algorithm 4 . For the complete description of the DSA, we refer the reader to [ 14 We suggest a modified DSA prime generation process leveraging the idea of

Section

4 . The procedure is described in Algorithm 5

Lemma 4

(Structure of κq). Letκqbe theκassociated toq. With the notations of Algorithm 5 , we haveκq= 2Q+1-4ω, assuming thatω <2Q2 -1.

Proof.

Letz=p-1qandω=q-2Q-1. We observe that?z?=P-Qandq=

2Q-1|ω. By definition,κq=

2Lqq , whereLq= 2Q. As we assumedω <2 Q2 -1, using Lemma 2 w eha ve: q=?2LQq =?22Q2

Q-1+ω?

= 2

Q+1-4ω.

Output:P arameters( p,q).

1Generate aQ-bit prime as follows:

2q←NextPrime(2Q-1);

3Construct aP-bit prime moduluspsuch thatp-1is a multiple ofqin the

following way:

4p←4;

5i←1;

6F←2P-Q-1;

7whilepis compositedo8p←2q(F+i) + 1;

9i+ +;

10end while

11return(p,q)The key consequence of Lemma4 is that κqconsists of a long pattern

concatenated to a short different sequence, with a predetermined portion that is the complement ofqh= 2Q-Ω. The computation ofκqis easy.

LetLp= 2P. By definition,κp=?

2Lpp

Lemma 5.

Letm(n) =18

n+⎷n

2+ 2P+3n?. Letxbe a positive integer such

?22P2

P-1+x?

= 2 Proof.The proof consists of writing the fraction as a geometric series:

κ=?22P2

P-1+x?

P+1∞?

n=0(-x)n2n(1-P)? ?2P+1?1-21-Px+ 22-2Px2-23-3Px3+...?? ?2P+1-4x+ 23-Px2-24-2Px3+...? Now,2P+1-4xis always a positive integer, it can therefore be safely taken out of the floor function. None of the remaining terms of the sum is an integer. We have:

κ= 2P+1-4x+?

n=2(-x)n2n(1-P)? The rightmost term is essentially a sum of shifted versions of powers ofx. Ifxis small, then this contribution quickly vanishes. We now provide an exact value for this sum, by rewriting:

κ= 2P+1-4x+?

2-Px22P-12

P-1+x?

= 2

P+1-4x+?4x22

P-1+x?

For any positive integern, we have:

4x22

P-1+x=n?x=18

n+?n

2+ 2P+3n?

.We assumedx >0, thus we only need to consider the positive root. The leftmost fraction is a strictly increasing function ofxas its derivative is>0. Therefore, the rightmost formula strictly increases withn.

Letm(n) =18

n+⎷n we have:

P-1+x< n+ 1

Therefore:

?4x22

P-1+x?

=n. Finally,x <2P-1implies an upper bound on the value ofn, which must therefore be smaller than2P. An illustrative example forP= 1024andQ= 160is given next.

Example 2.

quotesdbs_dbs9.pdfusesText_15

[PDF] Double-Speed Barrett Moduli constituents of most public-key

Double-Speed Barrett Moduli

École normale supérieure

1 Introduction

Related work:

Douguet and Dupaquis [

Algorithm (ECDSA) [

The approach of [

KneŽević, Batina and Verbauwhede [

Structure of the paper:

Section

2 Barrett"s Algorithm

Barrett"s algorithm (Algorithm

Work Factor:?c1?

Output:c=dmodn

1c1←d?(N-1);

2c2←c1κ;

3c3←c2?(L-N+ 1);

4c4←d-nc3;

5whilec4≥ndo6c4←c4-n;

7end while

8returnc43 Moduli with a Predetermined PortionRSA [15] moduli with a predetermined portion are used to reduce storage

Zuccherato [

Lemma 1

Consider the parameters used in Algorithm

Proof.By definition:

ω=?ηp

GCD(e,φ(n)) = 1.

τ01234

Substituting the value ofη, we get:

ω=nh2N-Hp

And upper boundingωwe get:

It follows directly from Lemma

1; elseq←ω". The quality of the generated primes will also become theoretically

Output:n=nh2N-H+n?, such that0< n?<2N-H

1Generate a random primep, such that2N-H-1< p <2N-H-1;

2η←nh2N-H;

3ω←?ηp

4q←NextPrime(ω);

5n←pq;

6returnnAlgorithm 3:Barrett-friendly modulus generatorInput:L= 2N= 4U

1Generate a random integerrsuch that2U-1< r <2U-1;

2η←2N-1+r;

3Generate a random primepsuch that2U-1< p <2U-1;

4ω←?ηp

5q←NextPrime(ω);

6n←pq;

7returnn4 Barrett-Friendly ModuliWe note that both multiplications in Algorithm1 are m ultiplicationsb yconstants.

Example 1.LetN= 100andL= 200:

κ=1ffffffffffd5cdb3e394fe440

Lemma 2.If0< x <2P/2-1, then?

P-1+x?

P+1-4x.

Proof.Observe that:

P-1+x-(2P+1-4x) =4x22

P-1+x.(2)

Furthermore,

1 +?1 + 2

P-1+x-(2P+1-4x)?

P-1+x?

Lemma 3

Consider the parameters

2N+1-2U+1(1 +m)< κ <2N+1andω <2U+ 2.

Proof.By definition:

ω=?ηp

Substituting the value ofη, we get:

ω=2N-1+rp

We observe that

N-1+r+α+mp≥12

N-1+r+mp.

Boundingκwe obtain:

κ=?2Ln

N-1+r+mp-1,

N-1+r+mp=22N2

N-1+r+mp= 2N+111 + 2

1-N(r+mp)= 2N+1∞?

κ > S

1-1 = 2N+1?1-21-N(r+pm)?-1 = 2N+1-4(r+pm)-1>2N+1-2U+1(1+m).

We observe that

N-1+r+α+mp >2N-1?12