2 Approximating Matrix Multiplication

Matrix Multiplication

Matrix Multiplication Brian Krummel February 28, 2020 Today we want to de ne matrix multiplication The idea is to de ne matrix multiplication as a composition of linear transformations In particular, let A be an m n matrix and B be a n p matrix Let T A(X) = AX and T B(X) = BX be the corresponding matrix transformations We

Matrix Multiplication - Michigan State University

Matrix Multiplication Simplify Write "undefined" for expressions that are undefined 1) 0 2 −2 −5 ⋅ 6 −6 3 0 2) 6 −3 ⋅ −5 4 3) −5 −5

Notes on Matrix Multiplication and the Transitive Closure

Notes on Matrix Multiplication and the Transitive Closure Instructor: Sandy Irani An n m matrix over a set S is an array of elements from S with n rows and m columns Each element in a matrix is called an entry The entry in row i and column j is denoted by A i;j A matrix is called a square matrix if the number of rows is equal to the number

Section 24 - Properties of Matrix-Matrix Multiplication

Matrix-Matrix Multiplication is Associative Let A, B, and C be matrices of conforming dimensions Then (AB)C = A(BC): Proof Let e j equal the jth unit basis vector Then (AB)Ce j = (AB)c

CS 140 : Matrix multiplication

Sequential Matrix Multiplication Simple mathematics, but getting good performance is complicated by memory hierarchy --- cache issues Naïve 3-loop matrix multiply

Matrix Multiplication - SageMath

J: matrix of Jordan blocks for eigenvalues P: nonsingular matrix A smith_form() triple with: D == U*A*V D: elementary divisors on diagonal U, V: with unit determinant A LU() triple with: P*A == L*U P: a permutation matrix L: lower triangular matrix, U: upper triangular matrix A QR() pair with: A == Q*R Q: a unitary matrix, R: upper triangular

Matrix-vector Multiplication

matrix Sequential algorithm complexity: (n2) – multiplying n elements of each row of the matrix times n elements of the vector Parallel algorithm computational complexity: (n2/p) Communication complexity of all-gather: (log p + n) Why? All processes sending log p results to one process Assuming that p is a square number

2 Approximating Matrix Multiplication

2 2 Approximating matrix multiplication by random sampling We will start by considering a very simple randomized algorithm to approximate the product of two matrices Matrix multiplication is a fundamental linear algebraic problem, and this randomized algorithm for it is of interest in its own right

[PDF] comment savoir si il prend du plaisir

[PDF] signes qu'un homme prend du plaisir

[PDF] arts visuels cycle 2 arbre printemps

[PDF] arts visuels arbres cycle 2

[PDF] arbre arts visuels cycle 3

[PDF] arbres arts visuels

[PDF] les arbres en arts plastiques ? l'école

[PDF] arbre arts visuels cycle 2

[PDF] arbre arts plastiques maternelle

[PDF] comment rediger un exercice de math

[PDF] redaction maths prepa

[PDF] académie de créteil recrutement sans concours

[PDF] date titularisation enseignant stagiaire 2017

[PDF] liste titularisation creteil 2016

[PDF] exemple de corpus corrigé poesie

Stat260/CS294: Randomized Algorithms for Matrices and Data

Lecture 2 - 09/09/2013

Lecture 2: Approximating Matrix Multiplication

Lecturer: Michael Mahoney Scribe: Michael MahoneyWarning: these notes are still very rough. They provide more details on what we discussed in class,

but there may still be some errors, incomplete/imprecise statements, etc. in them.

2 Approximating Matrix Multiplication

Approximating the product of two matrices with random sampling or random projection methods is

a fundamental operation that is of interest in and of itself as well as since it is used in a critical way

as a primitive for many RandNLA algorithms. In this class, we will introduce a basic algorithm; and in this and the next few classes, we will discusses several related methods. Here is the reading for today. Drineas, Kannan, and Mahoney, \Fast Monte Carlo Algorithms for Matrices I: Approximating

Matrix Multiplication"

2.1 Some notation

Before proceeding, here is some notation that we will use. For a vectorx2Rnwe letkxk2=Pn i=1jxij21=2denote its Euclidean length. For a matrixA2Rmnwe letA(j),j= 1;:::;n, denote thej-th column ofAas a column vector andA(i),i= 1;:::;m, denote thei-th row ofA as a row vector. We denote matrix norms bykAk, using subscripts to distinguish between various norms. Of particular interest will be the Frobenius norm which is dened by kAkF=v uutm X i=1n X j=1A

2ij;(1)

and the spectral norm which is dened by kAk2= sup x2Rn; x6=0kAxk2kxk2:(2) These norms are related to each other as:kAk2 kAkFpnkAk2. Both of these norms provide a measure of the \size" of the matrixA. There are several situations in which we will be interested in measuring the size of a matrix|e.g., the error that is incurred by a random sampling or random projection process can be viewed as a matrix, and we will be interested in showing that it is small in an appropriate norm. 1

2.2 Approximating matrix multiplication by random sampling

We will start by considering a very simple randomized algorithm to approximate the product of two matrices. Matrix multiplication is a fundamental linear algebraic problem, and this randomized

algorithm for it is of interest in its own right. In addition, this algorithm is of interest since matrix

multiplication is a primitive that is used|often \under the hood" or within the analysis|for many many other randomized algorithms for many many other matrix problems, and thus this algorithm will appear|either explicitly or implicitly, e.g., within the analysis|throughout the course. The problem is the following: given an arbitrarymnmatrixAand an arbitrarynpmatrixB, compute, exactly or approximately, the productAB. As a starting point, the well-known three-loop

algorithm to solve this problem is the following.Algorithm 1Vanilla three-look matrix multiplication algorithm.Input:AnmnmatrixAand annpmatrixB

Output:The productAB

1:fori= 1 tomdo

2:forj= 1 topdo

3:(AB)ij= 0

4:fork= 1 tondo

5:(AB)ik+=AijBjk

6:end for

7:end for

8:end for

9:ReturnABThe running time of this algorithm isO(mnp) time, which isO(n3) time ifm=n=p. Note in

particular that this algorithm loops over all pairs of elements in the product matrix and computes that element as a dot product or inner product between theithrow ofAand thejthcolumn ofB. The question of interest here is: can we solve this problem more quickly? There has been a lot of work on Strassen-like algorithms, which say that one can use a recursive procedure to decrease the running time too(n3) time. For a range of reasons, these algorithms are rarely-used in practice. They will not be our focus; but, since some of our randomized algorithms will call traditional algorithms as black boxes, Strassen-like algorithms can be used to speed up running times of those black boxes, theoretically at least. Here, we will consider a dierent approach: a randomized algorithm that randomly samples columns and rows of the matricesAandB. The key insight is that one shouldnotthink of matrix multi- plication as looping over elements in the product matrix and computing, say, the (i;j)th element of the product matrix as the dot product between theithrowofAand thejthcolumnofB, as is common. That is, the usual perspective is that the elements of the product matrix should be viewed as (AB)ik=nX j=1A ijBjk=A(i)B(j); where eachA(i)B(j)2Ris a number, computed as the inner product of two vectors inRn. Instead

of this, one should think of matrix multiplication as returning a matrix that equals the sum of outer

products ofcolumnsofAand the correspondingrowsofB, i.e., as the sum of rank-one matrices. 2

Recall that

AB=nX i=1A (i)B(i);(3) where eachA(i)B(i)2Rmpis anmprank-one matrix, computed as the outer product of two vectors inRn. Viewing matrix multiplication as the sum of outer productssuggests, by analogy with the sum of numbers, that we should sample rank-1 components, to minimize their size, according to their size. Recall that, if we were summing numbers, that we could sample (and rescale|see below) according to any probability distribution, and in particular the uniform distribution, and obtain an unbiased estimator of the sum; but that if we want to minimize the variance of the estimator that we should sample (and rescale) according to the size or magnitude of the numbers. Well, the same is true in

the case of matrices. Since the role of these probabilities will be important in what follows, we will

leave then unspecied as input to this algorithm, and we will return below to what probabilities should or could be used in this algorithm.

Given that background, here is ourBasicMatrixMultiplicationalgorithm.Algorithm 2TheBasicMatrixMultiplicationalgorithm.Input:AnmnmatrixA, annpmatrixB, a positive integerc, and probabilitiesfpigni=1.

Output:MatricesCandRsuch thatCRAB

1:fort= 1 tocdo

2:Pickit2 f1;:::;ngwith probabilityPr[it=k] =pk, in i.i.d. trials, with replacement

3:SetC(t)=A(it)=pcp

itandR(t)=B(it)=pcp it.

4:end for

5:ReturnCandR.Basically, what we want to show for this algorithm is that

AB=nX i=1A (i)B(i)) 1c c X t=11p itA(it)B(it)) =CR: In particular, we will want to show thatkABCRkis small, for appropriate matrix norms.

Not surprisingly, the extent to which we will be able to do this will depend strongly on the sampling

probabilitiesfpigni=1and the number of samplesc, in ways that we will describe in detail. For much of what follows, it will be convenient to express this and related subsequent algorithms in a standardized matrix notation that we will call thesampling matrix formalism. To do so, let

S2Rncbe a matrix such that

S ij=1 if theithcolumn ofAis chosen in thejthindependent trial

0 otherwise

and letD2Rccbe a diagonal matrix such that D tt= 1=pcp it 3 With this notation, we can write the output of theBasicMatrixMultiplicationalgorithm and what it is doing asC=ASD,R= (SD)TB, and

CR=ASD(SD)TB=ASSBAB:

Here,S=SDis just a way to absorb the diagonal rescaling matrix into the sampling matrix; we will do this often, and we will often refer to it simply asS|since one nearly always rescales in a standard way when one samples, the meaning should be clear from context.

2.3 Sampling probabilities and implementation issues

For approximating the product of two matrices, as well as for all of the other problems we will consider this semester, one can always sample uniformly at random. Unfortunately, as we will

show, by doing so one would do very poorly in general, in the sense that it is very easy to construct

matrices for which uniform sampling will perform extremely poorly. Thus, a central question in everything we will do will be: how should we construct the random sample? Sincefpigni=1are essentially importance sampling probabilities, informally we would like to choose samples that are more important. Quantifying this is a little subtle for some of the other problems we will consider, but for approximating the product of two matrices, it is quite easy. Recall that we are basically trying to approximate the product of two matrices by sampling randomly rank-one components in the sum Eqn. (3). Thus, by analogy with biasing oneself toward larger terms in the sum of numbers, in order to minimize variance, we would like to bias our random sample toward rank-one components that are larger. The notion of size or magnitude of a rank-one matrix that we will use is the spectral norm of the rank-one components. That is, we will choose columns of Aand rows ofBaccording to a probability distribution that is proportional to

A(i)B(i)

2. Since

this is a rank-1 matrix, this spectral norm expression takes a particularly simple form:

A(i)B(i)

2= A(i) 2 B(i) 2: Note that the norm on the left is the matrix spectral norm, while the two norms on the right are Euclidean vector norms. This equality is a consequence of the following simple lemma.

Claim 1Letxandybe column vectors inRn. Then,

xyT

2=kxk2kyk2.

Proof:Recall thatk

k2=p max( T ), for a matrix . Thus, xyT 2=p max(xyTyxT) =qkxk2 2kyk2

2=kxk2kyk2.

Depending on what one is interested in proving, probabilities that depend onAinBin other ways might be appropriate, and in a few cases we will do so. But probabilities that depend on the spectral norm of the rank-one components have proven to be remarkably useful in the general area of randomized numerical linear algebra. With respect to a few implementation issues, here are some things to note, when probabilities of dierent forms are used in theBasicMatrixMultiplicationalgorithm. Uniform sampling: one can choose which elements to keep before looking at the data, and so one can implement this algorithm in one-pass over the data. 4 For nonuniform sampling, if one usespi=jA(i)jB(i)jP n i0=1jA(i0)jB(i0)j, then one pass andO(n) additional space is sucient to compute the sampling probabilities|in that additional space, keep run- ning totals ofjA(i)j2andjB(i)j2, for alli, andO(n+p) space in the second pass can be used to choose the sample. For nonuniform sampling, ifB=ATand one usespi=jA(i)j2jjAjj2Fas the sampling probabilities, then by the select lemma one needs onlyO(1) additional space (multiplied by the number of samplescto be selected) in the rst pass to decide which samples to draw (and still need O(m+p) space in the second pass to keep the sample). Actually, the comments in the last bullet are true even ifB6=AT, i.e., if we sample based on the norms-squared ofAand completely ignore information inB. We will see that permitting this exibility is very helpful in certain situations.

2.4 Initial results for approximating the product of two matrices

Here is our rst result for the quality of approximation of theBasicMatrixMultiplicational- gorithm. This lemma holds for any set of sampling probabilitiesfpigni=1. It states thatCRis an unbiased estimator forAB, element-wise, and it provides an expression for the variance of that estimator that depends on the probabilities that are used. Lemma 1Given matricesAandB, construct matricesCandRwith theBasicMatrixMulti- plicationalgorithm. Then,

E[(CR)ij] = (AB)ij

and

Var[(CR)ij] =1c

n X k=1A

2ikB2kjp

k1c (AB)2ij:

Proof:Fixi;j. Fort= 1;:::;c, deneXt=

A(it)B(it)cp

it ij =AiitBitjcp it. Thus,

E[Xt] =nX

k=1p kAikBkjcp k=1c (AB)ijandEX2t=nX k=1A

2ikB2kjc

2pk:

Since by construction (CR)ij=Pc

t=1Xt, we haveE[(CR)ij] =Pc t=1E[Xt] = (AB)ij. Since (CR)ijis the sum ofcindependent random variables,Var[(CR)ij] =Pc t=1Var[Xt]. Since

Var[Xt] =EX2tE[Xt]2we see that

Var[Xt] =nX

k=1A

2ikB2kjc

2pk1c

2(AB)2ij

and the lemma follows.

Given Lemma 1, we can provide an upper bound onEh

kABCRk2 Fi and note how this measure of the error depends on thepi's. 5 Lemma 2Given matricesAandB, construct matricesCandRwith theBasicMatrixMulti- plicationalgorithm. Then, E h kABCRk2 Fi =nX k=1 A(k) 2 2 B(k) 2 2cp k1c kABk2 F:(4)

Furthermore, if

p k= A(k) 2 B(k) 2P n k 0=1 A(k0) 2 B(k0) 2;(5) then Eh kABCRk2 Fi =1c nX k=1 A(k) 2 B(k) 2! 2 1c kABk2 F:(6)

Proof:First, note that

E h kABCRk2 Fi =mX i=1p X j=1Eh (ABCR)2 iji =mX i=1p X j=1Var[(CR)ij]:

Thus, from Lemma 1 it follows that

E h kABCRk2 Fi =1c n X k=11p k X iA 2ik! 0 X jB 2kj1 Aquotesdbs_dbs24.pdfusesText_30

[PDF] 2 Approximating Matrix Multiplication

Lecture 2 - 09/09/2013

Lecture 2: Approximating Matrix Multiplication

2 Approximating Matrix Multiplication

Matrix Multiplication"

2.1 Some notation

2ij;(1)

2.2 Approximating matrix multiplication by random sampling

Output:The productAB

1:fori= 1 tomdo

2:forj= 1 topdo

3:(AB)ij= 0

4:fork= 1 tondo

5:(AB)ik+=AijBjk

6:end for

7:end for

8:end for

9:ReturnABThe running time of this algorithm isO(mnp) time, which isO(n3) time ifm=n=p. Note in

Recall that

Output:MatricesCandRsuch thatCRAB

1:fort= 1 tocdo

2:Pickit2 f1;:::;ngwith probabilityPr[it=k] =pk, in i.i.d. trials, with replacement

3:SetC(t)=A(it)=pcp

4:end for

5:ReturnCandR.Basically, what we want to show for this algorithm is that

S2Rncbe a matrix such that

0 otherwise

CR=ASD(SD)TB=ASSBAB:

2.3 Sampling probabilities and implementation issues

A(i)B(i)

2. Since

A(i)B(i)

Claim 1Letxandybe column vectors inRn. Then,

2=kxk2kyk2.

Proof:Recall thatk

2=kxk2kyk2.

2.4 Initial results for approximating the product of two matrices

E[(CR)ij] = (AB)ij

Var[(CR)ij] =1c

2ikB2kjp

Proof:Fixi;j. Fort= 1;:::;c, deneXt=

A(it)B(it)cp

E[Xt] =nX

2ikB2kjc

Since by construction (CR)ij=Pc

Var[Xt] =EX2tE[Xt]2we see that

Var[Xt] =nX

2ikB2kjc

2(AB)2ij

Given Lemma 1, we can provide an upper bound onEh

Furthermore, if

Proof:First, note that

Thus, from Lemma 1 it follows that